WorldWideScience

Sample records for two-sample test scores

  1. Power and sample size evaluation for the Cochran-Mantel-Haenszel mean score (Wilcoxon rank sum) test and the Cochran-Armitage test for trend.

    Science.gov (United States)

    Lachin, John M

    2011-11-10

    The power of a chi-square test, and thus the required sample size, are a function of the noncentrality parameter that can be obtained as the limiting expectation of the test statistic under an alternative hypothesis specification. Herein, we apply this principle to derive simple expressions for two tests that are commonly applied to discrete ordinal data. The Wilcoxon rank sum test for the equality of distributions in two groups is algebraically equivalent to the Mann-Whitney test. The Kruskal-Wallis test applies to multiple groups. These tests are equivalent to a Cochran-Mantel-Haenszel mean score test using rank scores for a set of C-discrete categories. Although various authors have assessed the power function of the Wilcoxon and Mann-Whitney tests, herein it is shown that the power of these tests with discrete observations, that is, with tied ranks, is readily provided by the power function of the corresponding Cochran-Mantel-Haenszel mean scores test for two and R > 2 groups. These expressions yield results virtually identical to those derived previously for rank scores and also apply to other score functions. The Cochran-Armitage test for trend assesses whether there is an monotonically increasing or decreasing trend in the proportions with a positive outcome or response over the C-ordered categories of an ordinal independent variable, for example, dose. Herein, it is shown that the power of the test is a function of the slope of the response probabilities over the ordinal scores assigned to the groups that yields simple expressions for the power of the test. Copyright © 2011 John Wiley & Sons, Ltd.

  2. What Do Test Scores Really Mean? A Latent Class Analysis of Danish Test Score Performance

    DEFF Research Database (Denmark)

    Munk, Martin D.; McIntosh, James

    2014-01-01

    Latent class Poisson count models are used to analyze a sample of Danish test score results from a cohort of individuals born in 1954-55, tested in 1968, and followed until 2011. The procedure takes account of unobservable effects as well as excessive zeros in the data. We show that the test scores...... of intelligence explain a significant proportion of the variation in test scores. This adds to the complexity of interpreting test scores and suggests that school culture and possible incentive problems make it more di¢ cult to understand what the tests measure....

  3. Association between the gait pattern characteristics of older people and their two-step test scores.

    Science.gov (United States)

    Kobayashi, Yoshiyuki; Ogata, Toru

    2018-04-27

    The Two-Step test is one of three official tests authorized by the Japanese Orthopedic Association to evaluate the risk of locomotive syndrome (a condition of reduced mobility caused by an impairment of the locomotive organs). It has been reported that the Two-Step test score has a good correlation with one's walking ability; however, its association with the gait pattern of older people during normal walking is still unknown. Therefore, this study aims to clarify the associations between the gait patterns of older people observed during normal walking and their Two-Step test scores. We analyzed the whole waveforms obtained from the lower-extremity joint angles and joint moments of 26 older people in various stages of locomotive syndrome using principal component analysis (PCA). The PCA was conducted using a 260 × 2424 input matrix constructed from the participants' time-normalized pelvic and right-lower-limb-joint angles along three axes (ten trials of 26 participants, 101 time points, 4 angles, 3 axes, and 2 variable types per trial). The Pearson product-moment correlation coefficient between the scores of the principal component vectors (PCVs) and the scores of the Two-Step test revealed that only one PCV (PCV 2) among the 61 obtained relevant PCVs is significantly related to the score of the Two-Step test. We therefore concluded that the joint angles and joint moments related to PCV 2-ankle plantar-flexion, ankle plantar-flexor moments during the late stance phase, ranges of motion and moments on the hip, knee, and ankle joints in the sagittal plane during the entire stance phase-are the motions associated with the Two-Step test.

  4. A weighted generalized score statistic for comparison of predictive values of diagnostic tests.

    Science.gov (United States)

    Kosinski, Andrzej S

    2013-03-15

    Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations that are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we presented, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic that incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, always reduces to the score statistic in the independent samples situation, and preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe that the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the WGS test statistic in a general GEE setting. Copyright © 2012 John Wiley & Sons, Ltd.

  5. Prediction of true test scores from observed item scores and ancillary data.

    Science.gov (United States)

    Haberman, Shelby J; Yao, Lili; Sinharay, Sandip

    2015-05-01

    In many educational tests which involve constructed responses, a traditional test score is obtained by adding together item scores obtained through holistic scoring by trained human raters. For example, this practice was used until 2008 in the case of GRE(®) General Analytical Writing and until 2009 in the case of TOEFL(®) iBT Writing. With use of natural language processing, it is possible to obtain additional information concerning item responses from computer programs such as e-rater(®). In addition, available information relevant to examinee performance may include scores on related tests. We suggest application of standard results from classical test theory to the available data to obtain best linear predictors of true traditional test scores. In performing such analysis, we require estimation of variances and covariances of measurement errors, a task which can be quite difficult in the case of tests with limited numbers of items and with multiple measurements per item. As a consequence, a new estimation method is suggested based on samples of examinees who have taken an assessment more than once. Such samples are typically not random samples of the general population of examinees, so that we apply statistical adjustment methods to obtain the needed estimated variances and covariances of measurement errors. To examine practical implications of the suggested methods of analysis, applications are made to GRE General Analytical Writing and TOEFL iBT Writing. Results obtained indicate that substantial improvements are possible both in terms of reliability of scoring and in terms of assessment reliability. © 2015 The British Psychological Society.

  6. Exploring a Source of Uneven Score Equity across the Test Score Range

    Science.gov (United States)

    Huggins-Manley, Anne Corinne; Qiu, Yuxi; Penfield, Randall D.

    2018-01-01

    Score equity assessment (SEA) refers to an examination of population invariance of equating across two or more subpopulations of test examinees. Previous SEA studies have shown that score equity may be present for examinees scoring at particular test score ranges but absent for examinees scoring at other score ranges. No studies to date have…

  7. Forward selection two sample binomial test

    Science.gov (United States)

    Wong, Kam-Fai; Wong, Weng-Kee; Lin, Miao-Shan

    2016-01-01

    Fisher’s exact test (FET) is a conditional method that is frequently used to analyze data in a 2 × 2 table for small samples. This test is conservative and attempts have been made to modify the test to make it less conservative. For example, Crans and Shuster (2008) proposed adding more points in the rejection region to make the test more powerful. We provide another way to modify the test to make it less conservative by using two independent binomial distributions as the reference distribution for the test statistic. We compare our new test with several methods and show that our test has advantages over existing methods in terms of control of the type 1 and type 2 errors. We reanalyze results from an oncology trial using our proposed method and our software which is freely available to the reader. PMID:27335577

  8. The quantitative LOD score: test statistic and sample size for exclusion and linkage of quantitative traits in human sibships.

    Science.gov (United States)

    Page, G P; Amos, C I; Boerwinkle, E

    1998-04-01

    We present a test statistic, the quantitative LOD (QLOD) score, for the testing of both linkage and exclusion of quantitative-trait loci in randomly selected human sibships. As with the traditional LOD score, the boundary values of 3, for linkage, and -2, for exclusion, can be used for the QLOD score. We investigated the sample sizes required for inferring exclusion and linkage, for various combinations of linked genetic variance, total heritability, recombination distance, and sibship size, using fixed-size sampling. The sample sizes required for both linkage and exclusion were not qualitatively different and depended on the percentage of variance being linked or excluded and on the total genetic variance. Information regarding linkage and exclusion in sibships larger than size 2 increased as approximately all possible pairs n(n-1)/2 up to sibships of size 6. Increasing the recombination (theta) distance between the marker and the trait loci reduced empirically the power for both linkage and exclusion, as a function of approximately (1-2theta)4.

  9. A two-sample Bayesian t-test for microarray data

    Directory of Open Access Journals (Sweden)

    Dimmic Matthew W

    2006-03-01

    Full Text Available Abstract Background Determining whether a gene is differentially expressed in two different samples remains an important statistical problem. Prior work in this area has featured the use of t-tests with pooled estimates of the sample variance based on similarly expressed genes. These methods do not display consistent behavior across the entire range of pooling and can be biased when the prior hyperparameters are specified heuristically. Results A two-sample Bayesian t-test is proposed for use in determining whether a gene is differentially expressed in two different samples. The test method is an extension of earlier work that made use of point estimates for the variance. The method proposed here explicitly calculates in analytic form the marginal distribution for the difference in the mean expression of two samples, obviating the need for point estimates of the variance without recourse to posterior simulation. The prior distribution involves a single hyperparameter that can be calculated in a statistically rigorous manner, making clear the connection between the prior degrees of freedom and prior variance. Conclusion The test is easy to understand and implement and application to both real and simulated data shows that the method has equal or greater power compared to the previous method and demonstrates consistent Type I error rates. The test is generally applicable outside the microarray field to any situation where prior information about the variance is available and is not limited to cases where estimates of the variance are based on many similar observations.

  10. On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests

    Directory of Open Access Journals (Sweden)

    Aaditya Ramdas

    2017-01-01

    Full Text Available Nonparametric two-sample or homogeneity testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. The literature is old and rich, with a wide variety of statistics having being designed and analyzed, both for the unidimensional and the multivariate setting. Inthisshortsurvey,wefocusonteststatisticsthatinvolvetheWassersteindistance. Usingan entropic smoothing of the Wasserstein distance, we connect these to very different tests including multivariate methods involving energy statistics and kernel based maximum mean discrepancy and univariate methods like the Kolmogorov–Smirnov test, probability or quantile (PP/QQ plots and receiver operating characteristic or ordinal dominance (ROC/ODC curves. Some observations are implicit in the literature, while others seem to have not been noticed thus far. Given nonparametric two-sample testing’s classical and continued importance, we aim to provide useful connections for theorists and practitioners familiar with one subset of methods but not others.

  11. Testing Homogeneity in a Semiparametric Two-Sample Problem

    Directory of Open Access Journals (Sweden)

    Yukun Liu

    2012-01-01

    Full Text Available We study a two-sample homogeneity testing problem, in which one sample comes from a population with density f(x and the other is from a mixture population with mixture density (1−λf(x+λg(x. This problem arises naturally from many statistical applications such as test for partial differential gene expression in microarray study or genetic studies for gene mutation. Under the semiparametric assumption g(x=f(xeα+βx, a penalized empirical likelihood ratio test could be constructed, but its implementation is hindered by the fact that there is neither feasible algorithm for computing the test statistic nor available research results on its theoretical properties. To circumvent these difficulties, we propose an EM test based on the penalized empirical likelihood. We prove that the EM test has a simple chi-square limiting distribution, and we also demonstrate its competitive testing performances by simulations. A real-data example is used to illustrate the proposed methodology.

  12. SKATE: a docking program that decouples systematic sampling from scoring.

    Science.gov (United States)

    Feng, Jianwen A; Marshall, Garland R

    2010-11-15

    SKATE is a docking prototype that decouples systematic sampling from scoring. This novel approach removes any interdependence between sampling and scoring functions to achieve better sampling and, thus, improves docking accuracy. SKATE systematically samples a ligand's conformational, rotational and translational degrees of freedom, as constrained by a receptor pocket, to find sterically allowed poses. Efficient systematic sampling is achieved by pruning the combinatorial tree using aggregate assembly, discriminant analysis, adaptive sampling, radial sampling, and clustering. Because systematic sampling is decoupled from scoring, the poses generated by SKATE can be ranked by any published, or in-house, scoring function. To test the performance of SKATE, ligands from the Asetex/CDCC set, the Surflex set, and the Vertex set, a total of 266 complexes, were redocked to their respective receptors. The results show that SKATE was able to sample poses within 2 A RMSD of the native structure for 98, 95, and 98% of the cases in the Astex/CDCC, Surflex, and Vertex sets, respectively. Cross-docking accuracy of SKATE was also assessed by docking 10 ligands to thymidine kinase and 73 ligands to cyclin-dependent kinase. 2010 Wiley Periodicals, Inc.

  13. Relationships between narrative language samples and norm-referenced test scores in language assessments of school-age children.

    Science.gov (United States)

    Danahy Ebert, Kerry; Scott, Cheryl M

    2014-10-01

    Both narrative language samples and norm-referenced language tests can be important components of language assessment for school-age children. The present study explored the relationship between these 2 tools within a group of children referred for language assessment. The study is a retrospective analysis of clinical records from 73 school-age children. Participants had completed an oral narrative language sample and at least one norm-referenced language test. Correlations between microstructural language sample measures and norm-referenced test scores were compared for younger (6- to 8-year-old) and older (9- to 12-year-old) children. Contingency tables were constructed to compare the 2 types of tools, at 2 different cutpoints, in terms of which children were identified as having a language disorder. Correlations between narrative language sample measures and norm-referenced tests were stronger for the younger group than the older group. Within the younger group, the level of language assessed by each measure contributed to associations among measures. Contingency analyses revealed moderate overlap in the children identified by each tool, with agreement affected by the cutpoint used. Narrative language samples may complement norm-referenced tests well, but age combined with narrative task can be expected to influence the nature of the relationship.

  14. A Human Capital Model of Educational Test Scores

    DEFF Research Database (Denmark)

    McIntosh, James; D. Munk, Martin

    Latent class Poisson count models are used to analyze a sample of Danish test score results from a cohort of individuals born in 1954-55 and tested in 1968. The procedure takes account of unobservable effects as well as excessive zeros in the data. The bulk of unobservable effects are uncorrelated...... with observable parental attributes and, thus, are environmental rather than genetic in origin. We show that the test scores measure manifest or measured ability as it has evolved over the life of the respondent and is, thus, more a product of the human capital formation process than some latent or fundamental...... measure of pure cognitive ability. We find that variables which are not closely associated with traditional notions of intelligence explain a significant proportion of the variation in test scores. This adds to the complexity of interpreting test scores and suggests that school culture, attitudes...

  15. The Effect of Mock Tests on Iranian EFL learners’ Test Scores

    Directory of Open Access Journals (Sweden)

    Hossein Khodabakhshzadeh

    2016-07-01

    Full Text Available The effect of using tests in test preparation courses has been subject to debate. While some scholars such as Yang and Badger (2015 believe it is a cause of positive washback effect, others argue that this issue is tentative and context-bound (Green, 2007. Therefore, this study investigated the effect of using Mock tests in International English Language Testing System (IELTS preparation courses on students’ overall IELTS scores. Fifty one IELTS students were selected non-randomly through the quota sampling approach out of 76 students at Mahan Language Institute in Birjand, Iran.  These participants were distributed into Group 1 (n=25 and Group 2 (n=26. A complete IELTS test was administered to ensure that the Groups were homogeneous and to serve as pretest. After 10 sessions of intervention, a different IELTS test was administered as posttest. The results of between subject analysis through independent samples t-test revealed that using Mock tests in the IELTS preparation courses can positively affect the participants scores on IELTS exam. Pedagogical implications are discussed.

  16. What do educational test scores really measure?

    DEFF Research Database (Denmark)

    McIntosh, James; D. Munk, Martin

    Latent class Poisson count models are used to analyze a sample of Danish test score results from a cohort of individuals born in 1954-55 and tested in 1968. The procedure takes account of unobservable effects as well as excessive zeros in the data. The bulk of unobservable effects are uncorrelate......, and possible incentive problems make it more difficult to elicit true values of what the tests measure....

  17. Modeling Floor Effects in Standardized Vocabulary Test Scores in a Sample of Low SES Hispanic Preschool Children under the Multilevel Structural Equation Modeling Framework

    Directory of Open Access Journals (Sweden)

    Leina Zhu

    2017-12-01

    Full Text Available Researchers and practitioners often use standardized vocabulary tests such as the Peabody Picture Vocabulary Test-4 (PPVT-4; Dunn and Dunn, 2007 and its companion, the Expressive Vocabulary Test-2 (EVT-2; Williams, 2007, to assess English vocabulary skills as an indicator of children's school readiness. Despite their psychometric excellence in the norm sample, issues arise when standardized vocabulary tests are used to asses children from culturally, linguistically and ethnically diverse backgrounds (e.g., Spanish-speaking English language learners or delayed in some manner. One of the biggest challenges is establishing the appropriateness of these measures with non-English or non-standard English speaking children as often they score one to two standard deviations below expected levels (e.g., Lonigan et al., 2013. This study re-examines the issues in analyzing the PPVT-4 and EVT-2 scores in a sample of 4-to-5-year-old low SES Hispanic preschool children who were part of a larger randomized clinical trial on the effects of a supplemental English shared-reading vocabulary curriculum (Pollard-Durodola et al., 2016. It was found that data exhibited strong floor effects and the presence of floor effects made it difficult to differentiate the invention group and the control group on their vocabulary growth in the intervention. A simulation study is then presented under the multilevel structural equation modeling (MSEM framework and results revealed that in regular multilevel data analysis, ignoring floor effects in the outcome variables led to biased results in parameter estimates, standard error estimates, and significance tests. Our findings suggest caution in analyzing and interpreting scores of ethnically and culturally diverse children on standardized vocabulary tests (e.g., floor effects. It is recommended appropriate analytical methods that take into account floor effects in outcome variables should be considered.

  18. LOD score exclusion analyses for candidate QTLs using random population samples.

    Science.gov (United States)

    Deng, Hong-Wen

    2003-11-01

    While extensive analyses have been conducted to test for, no formal analyses have been conducted to test against, the importance of candidate genes as putative QTLs using random population samples. Previously, we developed an LOD score exclusion mapping approach for candidate genes for complex diseases. Here, we extend this LOD score approach for exclusion analyses of candidate genes for quantitative traits. Under this approach, specific genetic effects (as reflected by heritability) and inheritance models at candidate QTLs can be analyzed and if an LOD score is < or = -2.0, the locus can be excluded from having a heritability larger than that specified. Simulations show that this approach has high power to exclude a candidate gene from having moderate genetic effects if it is not a QTL and is robust to population admixture. Our exclusion analysis complements association analysis for candidate genes as putative QTLs in random population samples. The approach is applied to test the importance of Vitamin D receptor (VDR) gene as a potential QTL underlying the variation of bone mass, an important determinant of osteoporosis.

  19. tscvh R Package: Computational of the two samples test on microarray-sequencing data

    Science.gov (United States)

    Fajriyah, Rohmatul; Rosadi, Dedi

    2017-12-01

    We present a new R package, a tscvh (two samples cross-variance homogeneity), as we called it. This package is a software of the cross-variance statistical test which has been proposed and introduced by Fajriyah ([3] and [4]), based on the cross-variance concept. The test can be used as an alternative test for the significance difference between two means when sample size is small, the situation which is usually appeared in the bioinformatics research. Based on its statistical distribution, the p-value can be also provided. The package is built under a homogeneity of variance between samples.

  20. Evaluation of Two Methods for Modeling Measurement Errors When Testing Interaction Effects with Observed Composite Scores

    Science.gov (United States)

    Hsiao, Yu-Yu; Kwok, Oi-Man; Lai, Mark H. C.

    2018-01-01

    Path models with observed composites based on multiple items (e.g., mean or sum score of the items) are commonly used to test interaction effects. Under this practice, researchers generally assume that the observed composites are measured without errors. In this study, we reviewed and evaluated two alternative methods within the structural…

  1. Validation of new prognostic and predictive scores by sequential testing approach

    International Nuclear Information System (INIS)

    Nieder, Carsten; Haukland, Ellinor; Pawinski, Adam; Dalhaug, Astrid

    2010-01-01

    Background and Purpose: For practitioners, the question arises how their own patient population differs from that used in large-scale analyses resulting in new scores and nomograms and whether such tools actually are valid at a local level and thus can be implemented. A recent article proposed an easy-to-use method for the in-clinic validation of new prediction tools with a limited number of patients, a so-called sequential testing approach. The present study evaluates this approach in scores related to radiation oncology. Material and Methods: Three different scores were used, each predicting short overall survival after palliative radiotherapy (bone metastases, brain metastases, metastatic spinal cord compression). For each scenario, a limited number of consecutive patients entered the sequential testing approach. The positive predictive value (PPV) was used for validation of the respective score and it was required that the PPV exceeded 80%. Results: For two scores, validity in the own local patient population could be confirmed after entering 13 and 17 patients, respectively. For the third score, no decision could be reached even after increasing the sample size to 30. Conclusion: In-clinic validation of new predictive tools with sequential testing approach should be preferred over uncritical adoption of tools which provide no significant benefit to local patient populations. Often the necessary number of patients can be reached within reasonable time frames even in small oncology practices. In addition, validation is performed continuously as the data are collected. (orig.)

  2. Validation of new prognostic and predictive scores by sequential testing approach

    Energy Technology Data Exchange (ETDEWEB)

    Nieder, Carsten [Radiation Oncology Unit, Nordland Hospital, Bodo (Norway); Inst. of Clinical Medicine, Univ. of Tromso (Norway); Haukland, Ellinor; Pawinski, Adam; Dalhaug, Astrid [Radiation Oncology Unit, Nordland Hospital, Bodo (Norway)

    2010-03-15

    Background and Purpose: For practitioners, the question arises how their own patient population differs from that used in large-scale analyses resulting in new scores and nomograms and whether such tools actually are valid at a local level and thus can be implemented. A recent article proposed an easy-to-use method for the in-clinic validation of new prediction tools with a limited number of patients, a so-called sequential testing approach. The present study evaluates this approach in scores related to radiation oncology. Material and Methods: Three different scores were used, each predicting short overall survival after palliative radiotherapy (bone metastases, brain metastases, metastatic spinal cord compression). For each scenario, a limited number of consecutive patients entered the sequential testing approach. The positive predictive value (PPV) was used for validation of the respective score and it was required that the PPV exceeded 80%. Results: For two scores, validity in the own local patient population could be confirmed after entering 13 and 17 patients, respectively. For the third score, no decision could be reached even after increasing the sample size to 30. Conclusion: In-clinic validation of new predictive tools with sequential testing approach should be preferred over uncritical adoption of tools which provide no significant benefit to local patient populations. Often the necessary number of patients can be reached within reasonable time frames even in small oncology practices. In addition, validation is performed continuously as the data are collected. (orig.)

  3. Sample Size Determination for One- and Two-Sample Trimmed Mean Tests

    Science.gov (United States)

    Luh, Wei-Ming; Olejnik, Stephen; Guo, Jiin-Huarng

    2008-01-01

    Formulas to determine the necessary sample sizes for parametric tests of group comparisons are available from several sources and appropriate when population distributions are normal. However, in the context of nonnormal population distributions, researchers recommend Yuen's trimmed mean test, but formulas to determine sample sizes have not been…

  4. Do Test Scores Buy Happiness?

    Science.gov (United States)

    McCluskey, Neal

    2017-01-01

    Since at least the enactment of No Child Left Behind in 2002, standardized test scores have served as the primary measures of public school effectiveness. Yet, such scores fail to measure the ultimate goal of education: maximizing happiness. This exploratory analysis assesses nation level associations between test scores and happiness, controlling…

  5. Minnesota Multiphasic Personality Inventory-2-Restructured Form (MMPI-2-RF) scores generated from the MMPI-2 and MMPI-2-RF test booklets: internal structure comparability in a sample of criminal defendants.

    Science.gov (United States)

    Tarescavage, Anthony M; Alosco, Michael L; Ben-Porath, Yossef S; Wood, Arcangela; Luna-Jones, Lynn

    2015-04-01

    We investigated the internal structure comparability of Minnesota Multiphasic Personality Inventory-2-Restructured Form (MMPI-2-RF) scores derived from the MMPI-2 and MMPI-2-RF booklets in a sample of 320 criminal defendants (229 males and 54 females). After exclusion of invalid protocols, the final sample consisted of 96 defendants who were administered the MMPI-2-RF booklet and 83 who completed the MMPI-2. No statistically significant differences in MMPI-2-RF invalidity rates were observed between the two forms. Individuals in the final sample who completed the MMPI-2-RF did not statistically differ on demographics or referral question from those who were administered the MMPI-2 booklet. Independent t tests showed no statistically significant differences between MMPI-2-RF scores generated with the MMPI-2 and MMPI-2-RF booklets on the test's substantive scales. Statistically significant small differences were observed on the revised Variable Response Inconsistency (VRIN-r) and True Response Inconsistency (TRIN-r) scales. Cronbach's alpha and standard errors of measurement were approximately equal between the booklets for all MMPI-2-RF scales. Finally, MMPI-2-RF intercorrelations produced from the two forms yielded mostly small and a few medium differences, indicating that discriminant validity and test structure are maintained. Overall, our findings reflect the internal structure comparability of MMPI-2-RF scale scores generated from MMPI-2 and MMPI-2-RF booklets. Implications of these results and limitations of these findings are discussed. © The Author(s) 2014.

  6. Group differences in the heritability of items and test scores

    NARCIS (Netherlands)

    Wicherts, J.M.; Johnson, W.

    2009-01-01

    It is important to understand potential sources of group differences in the heritability of intelligence test scores. On the basis of a basic item response model we argue that heritabilities which are based on dichotomous item scores normally do not generalize from one sample to the next. If groups

  7. LOD score exclusion analyses for candidate genes using random population samples.

    Science.gov (United States)

    Deng, H W; Li, J; Recker, R R

    2001-05-01

    While extensive analyses have been conducted to test for, no formal analyses have been conducted to test against, the importance of candidate genes with random population samples. We develop a LOD score approach for exclusion analyses of candidate genes with random population samples. Under this approach, specific genetic effects and inheritance models at candidate genes can be analysed and if a LOD score is < or = - 2.0, the locus can be excluded from having an effect larger than that specified. Computer simulations show that, with sample sizes often employed in association studies, this approach has high power to exclude a gene from having moderate genetic effects. In contrast to regular association analyses, population admixture will not affect the robustness of our analyses; in fact, it renders our analyses more conservative and thus any significant exclusion result is robust. Our exclusion analysis complements association analysis for candidate genes in random population samples and is parallel to the exclusion mapping analyses that may be conducted in linkage analyses with pedigrees or relative pairs. The usefulness of the approach is demonstrated by an application to test the importance of vitamin D receptor and estrogen receptor genes underlying the differential risk to osteoporotic fractures.

  8. Predicting occupational personality test scores.

    Science.gov (United States)

    Furnham, A; Drakeley, R

    2000-01-01

    The relationship between students' actual test scores and their self-estimated scores on the Hogan Personality Inventory (HPI; R. Hogan & J. Hogan, 1992), an omnibus personality questionnaire, was examined. Despite being given descriptive statistics and explanations of each of the dimensions measured, the students tended to overestimate their scores; yet all correlations between actual and estimated scores were positive and significant. Correlations between self-estimates and actual test scores were highest for sociability, ambition, and adjustment (r = .62 to r = .67). The results are discussed in terms of employers' use and abuse of personality assessment for job recruitment.

  9. Outlier removal, sum scores, and the inflation of the Type I error rate in independent samples t tests: the power of alternatives and recommendations.

    Science.gov (United States)

    Bakker, Marjan; Wicherts, Jelte M

    2014-09-01

    In psychology, outliers are often excluded before running an independent samples t test, and data are often nonnormal because of the use of sum scores based on tests and questionnaires. This article concerns the handling of outliers in the context of independent samples t tests applied to nonnormal sum scores. After reviewing common practice, we present results of simulations of artificial and actual psychological data, which show that the removal of outliers based on commonly used Z value thresholds severely increases the Type I error rate. We found Type I error rates of above 20% after removing outliers with a threshold value of Z = 2 in a short and difficult test. Inflations of Type I error rates are particularly severe when researchers are given the freedom to alter threshold values of Z after having seen the effects thereof on outcomes. We recommend the use of nonparametric Mann-Whitney-Wilcoxon tests or robust Yuen-Welch tests without removing outliers. These alternatives to independent samples t tests are found to have nominal Type I error rates with a minimal loss of power when no outliers are present in the data and to have nominal Type I error rates and good power when outliers are present. PsycINFO Database Record (c) 2014 APA, all rights reserved.

  10. Increased correlation coefficient between the written test score and tutors' performance test scores after training of tutors for assessment of medical students during problem-based learning course in Malaysia.

    Science.gov (United States)

    Jaiprakash, Heethal; Min, Aung Ko Ko; Ghosh, Sarmishtha

    2016-03-01

    This paper is aimed at finding if there was a change of correlation between the written test score and tutors' performance test scores in the assessment of medical students during a problem-based learning (PBL) course in Malaysia. This is a cross-sectional observational study, conducted among 264 medical students in two groups from November 2010 to November 2012. The first group's tutors did not receive tutor training; while the second group's tutors were trained in the PBL process. Each group was divided into high, middle and low achievers based on their end-of-semester exam scores. PBL scores were taken which included written test scores and tutors' performance test scores. Pearson correlation coefficient was calculated between the two kinds of scores in each group. The correlation coefficient between the written scores and tutors' scores in group 1 was 0.099 (pcorrelation coefficient in the group where tutors received the PBL training reinforces the importance of tutor training before their participation in the PBL course.

  11. Optimum sample size allocation to minimize cost or maximize power for the two-sample trimmed mean test.

    Science.gov (United States)

    Guo, Jiin-Huarng; Luh, Wei-Ming

    2009-05-01

    When planning a study, sample size determination is one of the most important tasks facing the researcher. The size will depend on the purpose of the study, the cost limitations, and the nature of the data. By specifying the standard deviation ratio and/or the sample size ratio, the present study considers the problem of heterogeneous variances and non-normality for Yuen's two-group test and develops sample size formulas to minimize the total cost or maximize the power of the test. For a given power, the sample size allocation ratio can be manipulated so that the proposed formulas can minimize the total cost, the total sample size, or the sum of total sample size and total cost. On the other hand, for a given total cost, the optimum sample size allocation ratio can maximize the statistical power of the test. After the sample size is determined, the present simulation applies Yuen's test to the sample generated, and then the procedure is validated in terms of Type I errors and power. Simulation results show that the proposed formulas can control Type I errors and achieve the desired power under the various conditions specified. Finally, the implications for determining sample sizes in experimental studies and future research are discussed.

  12. A Comparison of Two Scoring Methods for an Automated Speech Scoring System

    Science.gov (United States)

    Xi, Xiaoming; Higgins, Derrick; Zechner, Klaus; Williamson, David

    2012-01-01

    This paper compares two alternative scoring methods--multiple regression and classification trees--for an automated speech scoring system used in a practice environment. The two methods were evaluated on two criteria: construct representation and empirical performance in predicting human scores. The empirical performance of the two scoring models…

  13. Assessing Reliability of Two Versions of Vocabulary Levels Tests in Iranian Context

    Directory of Open Access Journals (Sweden)

    Aso Bayazidi

    2017-02-01

    Full Text Available This study examined the equivalence and reliability of the two versions of the Vocabulary Levels Test in an Iranian context. This study was motivated by the fact that the Vocabulary Levels test is increasingly being used in Iran for both research and pedagogical purposes without having been checked for validity and reliability in this context. The equivalence and reliability of the two versions of the test were examined through the parallel-form approach to reliability in Classical True Score theory. Seventy-five intermediate learners of English as a foreign language at the Iran Language Institute took the two versions of the test with one week interval between the two administrations in a counterbalanced fashion. To examine the equivalence of the two versions, the means and variances of the scores obtained for the two tests were compared using paired-sample t-test and one-way ANOVA, respectively. The results of the analyses indicated that the difference between the means of the two versions was significant, and the two versions cannot be considered as parallel forms. To assess the reliability of the two versions, the correlation between the scores obtained from them was estimated using Pearson Product Moment correlation. The results of the analyses showed that the two versions are highly correlated and are reliable tests. It is concluded that the two versions should not be treated as equivalent in longitudinal and gain score studies.

  14. Cross-validation of the Dot Counting Test in a large sample of credible and non-credible patients referred for neuropsychological testing.

    Science.gov (United States)

    McCaul, Courtney; Boone, Kyle B; Ermshar, Annette; Cottingham, Maria; Victor, Tara L; Ziegler, Elizabeth; Zeller, Michelle A; Wright, Matthew

    2018-01-18

    To cross-validate the Dot Counting Test in a large neuropsychological sample. Dot Counting Test scores were compared in credible (n = 142) and non-credible (n = 335) neuropsychology referrals. Non-credible patients scored significantly higher than credible patients on all Dot Counting Test scores. While the original E-score cut-off of ≥17 achieved excellent specificity (96.5%), it was associated with mediocre sensitivity (52.8%). However, the cut-off could be substantially lowered to ≥13.80, while still maintaining adequate specificity (≥90%), and raising sensitivity to 70.0%. Examination of non-credible subgroups revealed that Dot Counting Test sensitivity in feigned mild traumatic brain injury (mTBI) was 55.8%, whereas sensitivity was 90.6% in patients with non-credible cognitive dysfunction in the context of claimed psychosis, and 81.0% in patients with non-credible cognitive performance in depression or severe TBI. Thus, the Dot Counting Test may have a particular role in detection of non-credible cognitive symptoms in claimed psychiatric disorders. Alternative to use of the E-score, failure on ≥1 cut-offs applied to individual Dot Counting Test scores (≥6.0″ for mean grouped dot counting time, ≥10.0″ for mean ungrouped dot counting time, and ≥4 errors), occurred in 11.3% of the credible sample, while nearly two-thirds (63.6%) of the non-credible sample failed one of more of these cut-offs. An E-score cut-off of 13.80, or failure on ≥1 individual score cut-offs, resulted in few false positive identifications in credible patients, and achieved high sensitivity (64.0-70.0%), and therefore appear appropriate for use in identifying neurocognitive performance invalidity.

  15. TOEFL iBT Speaking Test Scores as Indicators of Oral Communicative Language Proficiency

    Science.gov (United States)

    Bridgeman, Brent; Powers, Donald; Stone, Elizabeth; Mollaun, Pamela

    2012-01-01

    Scores assigned by trained raters and by an automated scoring system (SpeechRater[TM]) on the speaking section of the TOEFL iBT[TM] were validated against a communicative competence criterion. Specifically, a sample of 555 undergraduate students listened to speech samples from 184 examinees who took the Test of English as a Foreign Language…

  16. Increased correlation coefficient between the written test score and tutors’ performance test scores after training of tutors for assessment of medical students during problem-based learning course in Malaysia

    Directory of Open Access Journals (Sweden)

    Heethal Jaiprakash

    2016-03-01

    Full Text Available This paper is aimed at finding if there was a change of correlation between the written test score and tutors’ performance test scores in the assessment of medical students during a problem-based learning (PBL course in Malaysia. This is a cross-sectional observational study, conducted among 264 medical students in two groups from November 2010 to November 2012. The first group’s tutors did not receive tutor training; while the second group’s tutors were trained in the PBL process. Each group was divided into high, middle and low achievers based on their end-of-semester exam scores. PBL scores were taken which included written test scores and tutors’ performance test scores. Pearson correlation coefficient was calculated between the two kinds of scores in each group. The correlation coefficient between the written scores and tutors’ scores in group 1 was 0.099 (p<0.001 and for group 2 was 0.305 (p<0.001. The higher correlation coefficient in the group where tutors received the PBL training reinforces the importance of tutor training before their participation in the PBL course.

  17. Equating error in observed-score equating

    NARCIS (Netherlands)

    van der Linden, Willem J.

    2006-01-01

    Traditionally, error in equating observed scores on two versions of a test is defined as the difference between the transformations that equate the quantiles of their distributions in the sample and population of test takers. But it is argued that if the goal of equating is to adjust the scores of

  18. High throughput sample processing and automated scoring

    Directory of Open Access Journals (Sweden)

    Gunnar eBrunborg

    2014-10-01

    Full Text Available The comet assay is a sensitive and versatile method for assessing DNA damage in cells. In the traditional version of the assay, there are many manual steps involved and few samples can be treated in one experiment. High throughput modifications have been developed during recent years, and they are reviewed and discussed. These modifications include accelerated scoring of comets; other important elements that have been studied and adapted to high throughput are cultivation and manipulation of cells or tissues before and after exposure, and freezing of treated samples until comet analysis and scoring. High throughput methods save time and money but they are useful also for other reasons: large-scale experiments may be performed which are otherwise not practicable (e.g., analysis of many organs from exposed animals, and human biomonitoring studies, and automation gives more uniform sample treatment and less dependence on operator performance. The high throughput modifications now available vary largely in their versatility, capacity, complexity and costs. The bottleneck for further increase of throughput appears to be the scoring.

  19. Reformulation of the Children's Eating Attitudes Test (ChEAT): factor structure and scoring method in a non-clinical population.

    Science.gov (United States)

    Anton, S D; Han, H; Newton, R L; Martin, C K; York-Crowe, E; Stewart, T M; Williamson, D A

    2006-12-01

    The primary aims of this study were to empirically test the factor structure of the Children's Eating Attitudes Test (ChEAT) through both exploratory and confirmatory factor analyses and to interpret the factor structure of the ChEAT within the context of a new scoring method. The ChEAT was administered to 728 children in the 2nd through 6th grades (from five schools) at two different time points. Exactly half the students were male and half were female. To the best of our knowledge, this is the first study to empirically test the merits of an alternative 6-point scoring system as compared to the traditionally used 4-point scoring system. With the new scoring procedure, the skewness for all factor scores decreased, which resulted in increased variance in the item scores, as well as the total ChEAT score. Since the internal consistency of two factors in a recently proposed model was not acceptable (ChEAT reported by previous investigations. Intercorrelations among the factors suggested three higher order constructs. These findings indicate that the ChEAT subscales may be sufficiently stable to allow use in non-clinical samples of children.

  20. Testing the entrepreneurial intention model on a two-country sample

    OpenAIRE

    Liñán, Francisco; Chen, Yi-Wen

    2006-01-01

    This paper tests the Entrepreneurial Intention Model -which is adapted from the Theory of Planned Behavior- on a sample of 533 individuals from two quite different countries: one of them European (Spain) and the other South Asian (Taiwan). A newly developed Entrepreneurial Intention Questionnaire (EIQ) has being used which tries to overcome some of the limitations of previous instruments. Structural equations techniques were used in the empirical analysis. Results are generally...

  1. The effect of instructional methodology on high school students natural sciences standardized tests scores

    Science.gov (United States)

    Powell, P. E.

    Educators have recently come to consider inquiry based instruction as a more effective method of instruction than didactic instruction. Experience based learning theory suggests that student performance is linked to teaching method. However, research is limited on inquiry teaching and its effectiveness on preparing students to perform well on standardized tests. The purpose of the study to investigate whether one of these two teaching methodologies was more effective in increasing student performance on standardized science tests. The quasi experimental quantitative study was comprised of two stages. Stage 1 used a survey to identify teaching methods of a convenience sample of 57 teacher participants and determined level of inquiry used in instruction to place participants into instructional groups (the independent variable). Stage 2 used analysis of covariance (ANCOVA) to compare posttest scores on a standardized exam by teaching method. Additional analyses were conducted to examine the differences in science achievement by ethnicity, gender, and socioeconomic status by teaching methodology. Results demonstrated a statistically significant gain in test scores when taught using inquiry based instruction. Subpopulation analyses indicated all groups showed improved mean standardized test scores except African American students. The findings benefit teachers and students by presenting data supporting a method of content delivery that increases teacher efficacy and produces students with a greater cognition of science content that meets the school's mission and goals.

  2. On the High-dimensional Power of Linear-time Kernel Two-Sample Testing under Mean-difference Alternatives

    OpenAIRE

    Ramdas, Aaditya; Reddi, Sashank J.; Poczos, Barnabas; Singh, Aarti; Wasserman, Larry

    2014-01-01

    Nonparametric two sample testing deals with the question of consistently deciding if two distributions are different, given samples from both, without making any parametric assumptions about the form of the distributions. The current literature is split into two kinds of tests - those which are consistent without any assumptions about how the distributions may differ (\\textit{general} alternatives), and those which are designed to specifically test easier alternatives, like a difference in me...

  3. Testing the entrepreneurial intention model on a two-country sample

    OpenAIRE

    Liñán, Francisco

    2006-01-01

    This paper tests the Entrepreneurial Intention Model -which is adapted from the Theory of Planned Behavior- on a sample of 533 individuals from two quite different countries: one of them European (Spain) and the other South Asian (Taiwan). A newly developed Entrepreneurial Intention Questionnaire (EIQ) has being used which tries to overcome some of the limitations of previous instruments. Structural equations techniques were used in the empirical analysis. Results are generally satisfactory, ...

  4. Validating the Interpretations and Uses of Test Scores

    Science.gov (United States)

    Kane, Michael T.

    2013-01-01

    To validate an interpretation or use of test scores is to evaluate the plausibility of the claims based on the scores. An argument-based approach to validation suggests that the claims based on the test scores be outlined as an argument that specifies the inferences and supporting assumptions needed to get from test responses to score-based…

  5. [Relationship between unipedal stance test score and center of pressure velocity in elderly].

    Science.gov (United States)

    Rodrigo Antonio, Guzmán; Rony, Silvestre; Francisco Aniceto, Rodríguez; David Andrés, Arriagada; Pablo Andrés, Ortega

    2011-01-01

    Frequent falls are one of the most important health problems in the elderly population. The unipedal stance test (UPST), asses postural stability and is used in fall risk measures. Despite this, there is little information about its relationship with posturographic parameters (PP) that characterizes postural stability. Center of pressure velocity (CoPV) is one of the best PP that describes postural stability. The aim of this study was to analyze the relation between UST score and CoPV in elderly population. A sample of 38 healthy elderly subjects where divided in two groups according to their UPST score, low performance (LP, n=11) and high performance (HP, n=27). The correlation between UPST score and COP mean velocity (CoPmV), recorded from a posturographic test, was analyzed between both groups. An inverse correlation between UPST score and CoPmV was found in both groups. However, this was higher in the LP group (r=-0.69, P=.02) compared to the HP (r=-0.39, P=.04). Based on the results of this investigation, it may be concluded that the achievement on UPST has an inverse relationship with CoPmV, especially in subjects with low performance in the UPST. Copyright © 2010 SEGG. Published by Elsevier Espana. All rights reserved.

  6. Effects of white noise on Callsign Acquisition Test and Modified Rhyme Test scores.

    Science.gov (United States)

    Blue-Terry, Misty; Letowski, Tomasz

    2011-02-01

    The Callsign Acquisition Test (CAT) is a speech intelligibility test developed by the US Army Research Laboratory. The test has been used to evaluate speech transmission through various communication systems but has not been yet sufficiently standardised and validated. The aim of this study was to compare CAT and Modified Rhyme Test (MRT) performance in the presence of white noise across a range of signal-to-noise ratios (SNRs). A group of 16 normal-hearing listeners participated in the study. The speech items were presented at 65 dB(A) in the background of white noise at SNRs of -18, -15, -12, -9 and -6 dB. The results showed a strong positive association (75.14%) between the two tests, but significant differences between the CAT and MRT absolute scores in the range of investigated SNRs. Based on the data, a function to predict CAT scores based on existing MRT scores and vice versa was formulated. STATEMENT OF RELEVANCE: This work compares performance data of a common speech intelligibility test (MRT) with a new test (CAT) in the presence of white noise. The results here can be used as a part of the standardisation procedures and provide insights to the predictive capabilities of the CAT to quantify speech intelligibility communication in high-noise military environments.

  7. On the Representativeness of Norming Samples for Aptitude Test

    National Research Council Canada - National Science Library

    Sims, William

    2003-01-01

    ...). We regressed aptitude test scores on demographics and concluded that: ̂ Norming sample for aptitude tests must be representative of the target population with respect to age, race"ethnicity, gender, respondent's education, and mother's...

  8. Two sampling techniques for game meat

    OpenAIRE

    van der Merwe, Maretha; Jooste, Piet J.; Hoffman, Louw C.; Calitz, Frikkie J.

    2013-01-01

    A study was conducted to compare the excision sampling technique used by the export market and the sampling technique preferred by European countries, namely the biotrace cattle and swine test. The measuring unit for the excision sampling was grams (g) and square centimetres (cm2) for the swabbing technique. The two techniques were compared after a pilot test was conducted on spiked approved beef carcasses (n = 12) that statistically proved the two measuring units correlated. The two sampling...

  9. Test/score/report: Simulation techniques for automating the test process

    Science.gov (United States)

    Hageman, Barbara H.; Sigman, Clayton B.; Koslosky, John T.

    1994-01-01

    A Test/Score/Report capability is currently being developed for the Transportable Payload Operations Control Center (TPOCC) Advanced Spacecraft Simulator (TASS) system which will automate testing of the Goddard Space Flight Center (GSFC) Payload Operations Control Center (POCC) and Mission Operations Center (MOC) software in three areas: telemetry decommutation, spacecraft command processing, and spacecraft memory load and dump processing. Automated computer control of the acceptance test process is one of the primary goals of a test team. With the proper simulation tools and user interface, the task of acceptance testing, regression testing, and repeatability of specific test procedures of a ground data system can be a simpler task. Ideally, the goal for complete automation would be to plug the operational deliverable into the simulator, press the start button, execute the test procedure, accumulate and analyze the data, score the results, and report the results to the test team along with a go/no recommendation to the test team. In practice, this may not be possible because of inadequate test tools, pressures of schedules, limited resources, etc. Most tests are accomplished using a certain degree of automation and test procedures that are labor intensive. This paper discusses some simulation techniques that can improve the automation of the test process. The TASS system tests the POCC/MOC software and provides a score based on the test results. The TASS system displays statistics on the success of the POCC/MOC system processing in each of the three areas as well as event messages pertaining to the Test/Score/Report processing. The TASS system also provides formatted reports documenting each step performed during the tests and the results of each step. A prototype of the Test/Score/Report capability is available and currently being used to test some POCC/MOC software deliveries. When this capability is fully operational it should greatly reduce the time necessary

  10. WAIS-III index score profiles in the Canadian standardization sample.

    Science.gov (United States)

    Lange, Rael T

    2007-01-01

    Representative index score profiles were examined in the Canadian standardization sample of the Wechsler Adult Intelligence Scale-Third Edition (WAIS-III). The identification of profile patterns was based on the methodology proposed by Lange, Iverson, Senior, and Chelune (2002) that aims to maximize the influence of profile shape and minimize the influence of profile magnitude on the cluster solution. A two-step cluster analysis procedure was used (i.e., hierarchical and k-means analyses). Cluster analysis of the four index scores (i.e., Verbal Comprehension [VCI], Perceptual Organization [POI], Working Memory [WMI], Processing Speed [PSI]) identified six profiles in this sample. Profiles were differentiated by pattern of performance and were primarily characterized as (a) high VCI/POI, low WMI/PSI, (b) low VCI/POI, high WMI/PSI, (c) high PSI, (d) low PSI, (e) high VCI/WMI, low POI/PSI, and (f) low VCI, high POI. These profiles are potentially useful for determining whether a patient's WAIS-III performance is unusual in a normal population.

  11. Clock Drawing Test and the diagnosis of amnestic mild cognitive impairment: can more detailed scoring systems do the work?

    Science.gov (United States)

    Rubínová, Eva; Nikolai, Tomáš; Marková, Hana; Siffelová, Kamila; Laczó, Jan; Hort, Jakub; Vyhnálek, Martin

    2014-01-01

    The Clock Drawing Test is a frequently used cognitive screening test with several scoring systems in elderly populations. We compare simple and complex scoring systems and evaluate the usefulness of the combination of the Clock Drawing Test with the Mini-Mental State Examination to detect patients with mild cognitive impairment. Patients with amnestic mild cognitive impairment (n = 48) and age- and education-matched controls (n = 48) underwent neuropsychological examinations, including the Clock Drawing Test and the Mini-Mental State Examination. Clock drawings were scored by three blinded raters using one simple (6-point scale) and two complex (17- and 18-point scales) systems. The sensitivity and specificity of these scoring systems used alone and in combination with the Mini-Mental State Examination were determined. Complex scoring systems, but not the simple scoring system, were significant predictors of the amnestic mild cognitive impairment diagnosis in logistic regression analysis. At equal levels of sensitivity (87.5%), the Mini-Mental State Examination showed higher specificity (31.3%, compared with 12.5% for the 17-point Clock Drawing Test scoring scale). The combination of Clock Drawing Test and Mini-Mental State Examination scores increased the area under the curve (0.72; p Drawing Test did not differentiate between healthy elderly and patients with amnestic mild cognitive impairment in our sample. Complex scoring systems were slightly more efficient, yet still were characterized by high rates of false-positive results. We found psychometric improvement using combined scores from the Mini-Mental State Examination and the Clock Drawing Test when complex scoring systems were used. The results of this study support the benefit of using combined scores from simple methods.

  12. Adaptive testing with equated number-correct scoring

    NARCIS (Netherlands)

    van der Linden, Willem J.

    1999-01-01

    A constrained CAT algorithm is presented that automatically equates the number-correct scores on adaptive tests. The algorithm can be used to equate number-correct scores across different administrations of the same adaptive test as well as to an external reference test. The constraints are derived

  13. Evaluation of Factors Affecting Continuous Performance Test Identical Pairs Version Score of Schizophrenic Patients in a Japanese Clinical Sample

    Directory of Open Access Journals (Sweden)

    Takayoshi Koide

    2012-01-01

    Full Text Available Aim. Cognitive impairment in schizophrenia strongly relates to social outcome and is a good candidate for endophenotypes. When we accurately measure drug efficacy or effects of genes or variants relevant to schizophrenia on cognitive impairment, clinical factors that can affect scores on cognitive tests, such as age and severity of symptoms, should be considered. To elucidate the effect of clinical factors, we conducted multiple regression analysis using scores of the Continuous Performance Test Identical Pairs Version (CPT-IP, which is often used to measure attention/vigilance in schizophrenia. Methods. We conducted the CPT-IP (4-4 digit and examined clinical information (sex, age, education years, onset age, duration of illness, chlorpromazine-equivalent dose, and Positive and Negative Symptom Scale (PANSS scores in 126 schizophrenia patients in Japanese population. Multiple regression analysis was used to evaluate the effect of clinical factors. Results. Age, chlorpromazine-equivalent dose, and PANSS-negative symptom score were associated with mean d′ score in patients. These three clinical factors explained about 28% of the variance in mean d′ score. Conclusions. As conclusion, CPT-IP score in schizophrenia patients is influenced by age, chlorpromazine-equivalent dose and PANSS negative symptom score.

  14. ITC Guidelines on Quality Control in Scoring, Test Analysis, and Reporting of Test Scores

    Science.gov (United States)

    Allalouf, Avi

    2014-01-01

    The Quality Control (QC) Guidelines are intended to increase the efficiency, precision, and accuracy of the scoring, analysis, and reporting process of testing. The QC Guidelines focus on large-scale testing operations where multiple forms of tests are created for use on set dates. However, they may also be used for a wide variety of other testing…

  15. Summary of Score Changes (in other Tests).

    Science.gov (United States)

    Cleary, T. Anne; McCandless, Sam A.

    Scholastic Aptitude Test (SAT) scores have declined during the last 14 years. Similar score declines have been observed in many different testing programs, many groups, and tested areas. The declines, while not large in any given year, have been consistent over time, area, and group. The period around 1965 is critical for the interpretation of…

  16. Cost-effectiveness of one versus two sample faecal immunochemical testing for colorectal cancer screening.

    Science.gov (United States)

    Goede, S Lucas; van Roon, Aafke H C; Reijerink, Jacqueline C I Y; van Vuuren, Anneke J; Lansdorp-Vogelaar, Iris; Habbema, J Dik F; Kuipers, Ernst J; van Leerdam, Monique E; van Ballegooijen, Marjolein

    2013-05-01

    The sensitivity and specificity of a single faecal immunochemical test (FIT) are limited. The performance of FIT screening can be improved by increasing the screening frequency or by providing more than one sample in each screening round. This study aimed to evaluate if two-sample FIT screening is cost-effective compared with one-sample FIT. The MISCAN-colon microsimulation model was used to estimate costs and benefits of strategies with either one or two-sample FIT screening. The FIT cut-off level varied between 50 and 200 ng haemoglobin/ml, and the screening schedule was varied with respect to age range and interval. In addition, different definitions for positivity of the two-sample FIT were considered: at least one positive sample, two positive samples, or the mean of both samples being positive. Within an exemplary screening strategy, biennial FIT from the age of 55-75 years, one-sample FIT provided 76.0-97.0 life-years gained (LYG) per 1000 individuals, at a cost of € 259,000-264,000 (range reflects different FIT cut-off levels). Two-sample FIT screening with at least one sample being positive provided 7.3-12.4 additional LYG compared with one-sample FIT at an extra cost of € 50,000-59,000. However, when all screening intervals and age ranges were considered, intensifying screening with one-sample FIT provided equal or more LYG at lower costs compared with two-sample FIT. If attendance to screening does not differ between strategies it is recommended to increase the number of screening rounds with one-sample FIT screening, before considering increasing the number of FIT samples provided per screening round.

  17. Location tests for biomarker studies: a comparison using simulations for the two-sample case.

    Science.gov (United States)

    Scheinhardt, M O; Ziegler, A

    2013-01-01

    Gene, protein, or metabolite expression levels are often non-normally distributed, heavy tailed and contain outliers. Standard statistical approaches may fail as location tests in this situation. In three Monte-Carlo simulation studies, we aimed at comparing the type I error levels and empirical power of standard location tests and three adaptive tests [O'Gorman, Can J Stat 1997; 25: 269 -279; Keselman et al., Brit J Math Stat Psychol 2007; 60: 267- 293; Szymczak et al., Stat Med 2013; 32: 524 - 537] for a wide range of distributions. We simulated two-sample scenarios using the g-and-k-distribution family to systematically vary tail length and skewness with identical and varying variability between groups. All tests kept the type I error level when groups did not vary in their variability. The standard non-parametric U-test performed well in all simulated scenarios. It was outperformed by the two non-parametric adaptive methods in case of heavy tails or large skewness. Most tests did not keep the type I error level for skewed data in the case of heterogeneous variances. The standard U-test was a powerful and robust location test for most of the simulated scenarios except for very heavy tailed or heavy skewed data, and it is thus to be recommended except for these cases. The non-parametric adaptive tests were powerful for both normal and non-normal distributions under sample variance homogeneity. But when sample variances differed, they did not keep the type I error level. The parametric adaptive test lacks power for skewed and heavy tailed distributions.

  18. A Maturing Global Testing Regime Meets the World Economy: Test Scores and Economic Growth, 1960-2012

    Science.gov (United States)

    Kamens, David H.

    2015-01-01

    This article considers the growth of the international testing regime. It discusses sources of growth and empirically examines two related sets of issues: (1) the stability of countries' achievement scores, and (2) the influence of those national scores on subsequent economic development over different time lags. The article suggests that…

  19. Contributions of Hamstring Stiffness to Straight-Leg-Raise and Sit-and-Reach Test Scores.

    Science.gov (United States)

    Miyamoto, Naokazu; Hirata, Kosuke; Kimura, Noriko; Miyamoto-Mikami, Eri

    2018-02-01

    The passive straight-leg-raise (PSLR) and the sit-and-reach (SR) tests have been widely used to assess hamstring extensibility. However, it remains unclear to what extent hamstring stiffness (a measure of material properties) contributes to PSLR and SR test scores. Therefore, we aimed to clarify the relationship between hamstring stiffness and PSLR and SR scores using ultrasound shear wave elastography. Ninety-eight healthy subjects completed the study. Each subject completed PSLR testing, and classic and modified SR testing of the right leg. Muscle shear modulus of the biceps femoris, semitendinosus, and semimembranosus was quantified as an index of muscle stiffness. The relationships between shear modulus of each muscle and PSLR or SR scores were calculated using Pearson's product-moment correlation coefficients. Shear modulus of the semitendinosus and semimembranosus showed negative correlations with the two PSLR and two SR scores (absolute r value≤0.484). Shear modulus of the biceps femoris was significantly correlated with the PSLR score determined by the examiner and the modified SR score (absolute r value≤0.308). The present findings suggest that PSLR and SR test scores are strongly influenced by factors other than hamstring stiffness and therefore might not accurately evaluate hamstring stiffness. © Georg Thieme Verlag KG Stuttgart · New York.

  20. Spinal appearance questionnaire: factor analysis, scoring, reliability, and validity testing.

    Science.gov (United States)

    Carreon, Leah Y; Sanders, James O; Polly, David W; Sucato, Daniel J; Parent, Stefan; Roy-Beaudry, Marjolaine; Hopkins, Jeffrey; McClung, Anna; Bratcher, Kelly R; Diamond, Beverly E

    2011-08-15

    Cross sectional. This study presents the factor analysis of the Spinal Appearance Questionnaire (SAQ) and its psychometric properties. Although the SAQ has been administered to a large sample of patients with adolescent idiopathic scoliosis (AIS) treated surgically, its psychometric properties have not been fully evaluated. This study presents the factor analysis and scoring of the SAQ and evaluates its psychometric properties. The SAQ and the Scoliosis Research Society-22 (SRS-22) were administered to AIS patients who were being observed, braced or scheduled for surgery. Standard demographic data and radiographic measures including Lenke type and curve magnitude were also collected. Of the 1802 patients, 83% were female; with a mean age of 14.8 years and mean initial Cobb angle of 55.8° (range, 0°-123°). From the 32 items of the SAQ, 15 loaded on two factors with consistent and significant correlations across all Lenke types. There is an Appearance (items 1-10) and an Expectations factor (items 12-15). Responses are summed giving a range of 5 to 50 for the Appearance domain and 5 to 20 for the Expectations domain. The Cronbach's α was 0.88 for both domains and Total score with a test-retest reliability of 0.81 for Appearance and 0.91 for Expectations. Correlations with major curve magnitude were higher for the SAQ Appearance and SAQ Total scores compared to correlations between the SRS Appearance and SRS Total scores. The SAQ and SRS-22 Scores were statistically significantly different in patients who were scheduled for surgery compared to those who were observed or braced. The SAQ is a valid measure of self-image in patients with AIS with greater correlation to curve magnitude than SRS Appearance and Total score. It also discriminates between patients who require surgery from those who do not.

  1. Two-Sample Statistics for Testing the Equality of Survival Functions Against Improper Semi-parametric Accelerated Failure Time Alternatives: An Application to the Analysis of a Breast Cancer Clinical Trial

    Science.gov (United States)

    BROËT, PHILIPPE; TSODIKOV, ALEXANDER; DE RYCKE, YANN; MOREAU, THIERRY

    2010-01-01

    This paper presents two-sample statistics suited for testing equality of survival functions against improper semi-parametric accelerated failure time alternatives. These tests are designed for comparing either the short- or the long-term effect of a prognostic factor, or both. These statistics are obtained as partial likelihood score statistics from a time-dependent Cox model. As a consequence, the proposed tests can be very easily implemented using widely available software. A breast cancer clinical trial is presented as an example to demonstrate the utility of the proposed tests. PMID:15293627

  2. Two-sample statistics for testing the equality of survival functions against improper semi-parametric accelerated failure time alternatives: an application to the analysis of a breast cancer clinical trial.

    Science.gov (United States)

    Broët, Philippe; Tsodikov, Alexander; De Rycke, Yann; Moreau, Thierry

    2004-06-01

    This paper presents two-sample statistics suited for testing equality of survival functions against improper semi-parametric accelerated failure time alternatives. These tests are designed for comparing either the short- or the long-term effect of a prognostic factor, or both. These statistics are obtained as partial likelihood score statistics from a time-dependent Cox model. As a consequence, the proposed tests can be very easily implemented using widely available software. A breast cancer clinical trial is presented as an example to demonstrate the utility of the proposed tests.

  3. Data-driven efficient score tests for deconvolution hypotheses

    NARCIS (Netherlands)

    Langovoy, M.

    2008-01-01

    We consider testing statistical hypotheses about densities of signals in deconvolution models. A new approach to this problem is proposed. We constructed score tests for the deconvolution density testing with the known noise density and efficient score tests for the case of unknown density. The

  4. Optimal Scoring Methods of Hand-Strength Tests in Patients with Stroke

    Science.gov (United States)

    Huang, Sheau-Ling; Hsieh, Ching-Lin; Lin, Jau-Hong; Chen, Hui-Mei

    2011-01-01

    The purpose of this study was to determine the optimal scoring methods for measuring strength of the more-affected hand in patients with stroke by examining the effect of reducing measurement errors. Three hand-strength tests of grip, palmar pinch, and lateral pinch were administered at two sessions in 56 patients with stroke. Five scoring methods…

  5. Do Standardized Tests Penalize Deep-Thinking, Creative, or Conscientious Students?: Some Personality Correlates of Graduate Record Examinations Test Scores

    Science.gov (United States)

    Powers, Donald E.; Kaufman, James C.

    2004-01-01

    The objective of the study reported here was to explore the relationship of Graduate Record Examinations (GRE) General Test scores to selected personality traits--conscientiousness, rationality, ingenuity, quickness, creativity, and depth. A sample of 342 GRE test takers completed short personality inventory scales for each trait. Analyses…

  6. Comparison of Two Methods for Estimation of Work Limitation Scores from Health Status Measures

    DEFF Research Database (Denmark)

    Anatchkova, M; Fang, H; Kini, N

    2015-01-01

    Objectives To compare two methods for estimation of Work Limitations Questionnaire scores (WLQ, 8 items) from the Role Physical (RP, 4 items) and Role Emotional scales (RE, 3 items) of the SF-36 Health survey. These measures assess limitations in role performance attributed to health (emotional...... future data collection strategies. Methods We used data from two independent cross-sectional panel samples (Sample1, n=1382, 51% female, 72% Caucasian, 49% with preselected chronic conditions, 15% with fair/poor health; Sample2, n=301, 45% female, 90% Caucasian, 47% with preselected chronic conditions......, 21% with fair/poor health). Method 1 used previously developed and validated IRT based calibration tables. Method 2 used regression models to develop aggregate imputation weights as described in the literature. We evaluated the agreement of observed and estimated WLQ scale scores from the two methods...

  7. Comparison of two methods for composite score generation in dry eye syndrome.

    Science.gov (United States)

    See, Craig; Bilonick, Richard A; Feuer, William; Galor, Anat

    2013-09-19

    To compare two methods of composite score generation in dry eye syndrome (DES). Male patients seen in the Miami Veterans Affairs eye clinic with normal eyelid, corneal, and conjunctival anatomy were recruited to participate in the study. Patients filled out the Dry Eye Questionnaire 5 (DEQ5) and underwent measurement of tear film parameters. DES severity scores were generated by independent component analysis (ICA) and latent class analysis (LCA). A total of 247 men were included in the study. Mean age was 69 years (SD 9). Using ICA analysis, osmolarity was found to carry the largest weight, followed by eyelid vascularity and meibomian orifice plugging. Conjunctival injection and tear breakup time (TBUT) carried the lowest weights. Using LCA analysis, TBUT was found to be best at discriminating healthy from diseased eyes, followed closely by Schirmer's test. DEQ5, eyelid vascularity, and conjunctival injection were the poorest at discrimination. The adjusted correlation coefficient between the two generated composite scores was 0.63, indicating that the shared variance was less than 40%. Both ICA and LCA produced composite scores for dry eye severity, with weak to moderate agreement; however, agreement for the relative importance of single diagnostic tests was poor between the two methods.

  8. Facilitating the Interpretation of English Language Proficiency Scores: Combining Scale Anchoring and Test Score Mapping Methodologies

    Science.gov (United States)

    Powers, Donald; Schedl, Mary; Papageorgiou, Spiros

    2017-01-01

    The aim of this study was to develop, for the benefit of both test takers and test score users, enhanced "TOEFL ITP"® test score reports that go beyond the simple numerical scores that are currently reported. To do so, we applied traditional scale anchoring (proficiency scaling) to item difficulty data in order to develop performance…

  9. Italian normative data and validation of two neuropsychological tests of face recognition: Benton Facial Recognition Test and Cambridge Face Memory Test.

    Science.gov (United States)

    Albonico, Andrea; Malaspina, Manuela; Daini, Roberta

    2017-09-01

    The Benton Facial Recognition Test (BFRT) and Cambridge Face Memory Test (CFMT) are two of the most common tests used to assess face discrimination and recognition abilities and to identify individuals with prosopagnosia. However, recent studies highlighted that participant-stimulus match ethnicity, as much as gender, has to be taken into account in interpreting results from these tests. Here, in order to obtain more appropriate normative data for an Italian sample, the CFMT and BFRT were administered to a large cohort of young adults. We found that scores from the BFRT are not affected by participants' gender and are only slightly affected by participant-stimulus ethnicity match, whereas both these factors seem to influence the scores of the CFMT. Moreover, the inclusion of a sample of individuals with suspected face recognition impairment allowed us to show that the use of more appropriate normative data can increase the BFRT efficacy in identifying individuals with face discrimination impairments; by contrast, the efficacy of the CFMT in classifying individuals with a face recognition deficit was confirmed. Finally, our data show that the lack of inversion effect (the difference between the total score of the upright and inverted versions of the CFMT) could be used as further index to assess congenital prosopagnosia. Overall, our results confirm the importance of having norms derived from controls with a similar experience of faces as the "potential" prosopagnosic individuals when assessing face recognition abilities.

  10. External Validation of the Simple Clinical Score and the HOTEL Score, Two Scores for Predicting Short-Term Mortality after Admission to an Acute Medical Unit

    DEFF Research Database (Denmark)

    Stræde, Mia; Brabrand, Mikkel

    2014-01-01

    with the objective of validating the Simple Clinical Score (SCS) and the HOTEL score, two existing risk stratification systems that predict mortality for medical patients based solely on clinical information, but not only vital signs. METHODS: Pre-planned prospective observational cohort study. SETTING: Danish 460.......932 to 0.988) for 24-hours mortality and 0.826 (95% CI, 0.774-0.879) for 30-day mortality, and goodness-of-fit test, χ2 = 2.68 (10 degrees of freedom), P = 0.998 and χ2 = 4.00, P = 0.947, respectively. We included 1470 patients when calculating the HOTEL score. Discriminatory power (AUROC) was 0.931 (95......% CI, 0.901-0.962) for 24-hours mortality and goodness-of-fit test, χ2 = 5.56 (10 degrees of freedom), P = 0.234. CONCLUSION: We find that both the SCS and HOTEL scores showed an excellent to outstanding ability in identifying patients at high risk of dying with good or acceptable precision....

  11. Dividing the Force Concept Inventory into two equivalent half-length tests

    Directory of Open Access Journals (Sweden)

    Jing Han

    2015-05-01

    Full Text Available The Force Concept Inventory (FCI is a 30-question multiple-choice assessment that has been a building block for much of the physics education research done today. In practice, there are often concerns regarding the length of the test and possible test-retest effects. Since many studies in the literature use the mean score of the FCI as the primary variable, it would be useful then to have different shorter tests that can produce FCI-equivalent scores while providing the benefits of being quicker to administer and overcoming the test-retest effects. In this study, we divide the 1995 version of the FCI into two half-length tests; each contains a different subset of the original FCI questions. The two new tests are shorter, still cover the same set of concepts, and produce mean scores equivalent to those of the FCI. Using a large quantitative data set collected at a large midwestern university, we statistically compare the assessment features of the two half-length tests and the full-length FCI. The results show that the mean error of equivalent scores between any two of the three tests is within 3%. Scores from all tests are well correlated. Based on the analysis, it appears that the two half-length tests can be a viable option for score based assessment that need to administer tests quickly or need to measure short-term gains where using identical pre- and post-test questions is a concern.

  12. Poisson Approximation-Based Score Test for Detecting Association of Rare Variants.

    Science.gov (United States)

    Fang, Hongyan; Zhang, Hong; Yang, Yaning

    2016-07-01

    Genome-wide association study (GWAS) has achieved great success in identifying genetic variants, but the nature of GWAS has determined its inherent limitations. Under the common disease rare variants (CDRV) hypothesis, the traditional association analysis methods commonly used in GWAS for common variants do not have enough power for detecting rare variants with a limited sample size. As a solution to this problem, pooling rare variants by their functions provides an efficient way for identifying susceptible genes. Rare variant typically have low frequencies of minor alleles, and the distribution of the total number of minor alleles of the rare variants can be approximated by a Poisson distribution. Based on this fact, we propose a new test method, the Poisson Approximation-based Score Test (PAST), for association analysis of rare variants. Two testing methods, namely, ePAST and mPAST, are proposed based on different strategies of pooling rare variants. Simulation results and application to the CRESCENDO cohort data show that our methods are more powerful than the existing methods. © 2016 John Wiley & Sons Ltd/University College London.

  13. The Truth about Scores Children Achieve on Tests.

    Science.gov (United States)

    Brown, Jonathan R.

    1989-01-01

    The importance of using the standard error of measurement (SEm) in determining reliability in test scores is emphasized. The SEm is compared to the hypothetical true score for standardized tests, and procedures for calculation of the SEm are explained. (JDD)

  14. Test sample handling apparatus

    International Nuclear Information System (INIS)

    1981-01-01

    A test sample handling apparatus using automatic scintillation counting for gamma detection, for use in such fields as radioimmunoassay, is described. The apparatus automatically and continuously counts large numbers of samples rapidly and efficiently by the simultaneous counting of two samples. By means of sequential ordering of non-sequential counting data, it is possible to obtain precisely ordered data while utilizing sample carrier holders having a minimum length. (U.K.)

  15. Two-Sample Tests for High-Dimensional Linear Regression with an Application to Detecting Interactions.

    Science.gov (United States)

    Xia, Yin; Cai, Tianxi; Cai, T Tony

    2018-01-01

    Motivated by applications in genomics, we consider in this paper global and multiple testing for the comparisons of two high-dimensional linear regression models. A procedure for testing the equality of the two regression vectors globally is proposed and shown to be particularly powerful against sparse alternatives. We then introduce a multiple testing procedure for identifying unequal coordinates while controlling the false discovery rate and false discovery proportion. Theoretical justifications are provided to guarantee the validity of the proposed tests and optimality results are established under sparsity assumptions on the regression coefficients. The proposed testing procedures are easy to implement. Numerical properties of the procedures are investigated through simulation and data analysis. The results show that the proposed tests maintain the desired error rates under the null and have good power under the alternative at moderate sample sizes. The procedures are applied to the Framingham Offspring study to investigate the interactions between smoking and cardiovascular related genetic mutations important for an inflammation marker.

  16. Identifying genetic marker sets associated with phenotypes via an efficient adaptive score test

    KAUST Repository

    Cai, T.

    2012-06-25

    In recent years, genome-wide association studies (GWAS) and gene-expression profiling have generated a large number of valuable datasets for assessing how genetic variations are related to disease outcomes. With such datasets, it is often of interest to assess the overall effect of a set of genetic markers, assembled based on biological knowledge. Genetic marker-set analyses have been advocated as more reliable and powerful approaches compared with the traditional marginal approaches (Curtis and others, 2005. Pathways to the analysis of microarray data. TRENDS in Biotechnology 23, 429-435; Efroni and others, 2007. Identification of key processes underlying cancer phenotypes using biologic pathway analysis. PLoS One 2, 425). Procedures for testing the overall effect of a marker-set have been actively studied in recent years. For example, score tests derived under an Empirical Bayes (EB) framework (Liu and others, 2007. Semiparametric regression of multidimensional genetic pathway data: least-squares kernel machines and linear mixed models. Biometrics 63, 1079-1088; Liu and others, 2008. Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models. BMC bioinformatics 9, 292-2; Wu and others, 2010. Powerful SNP-set analysis for case-control genome-wide association studies. American Journal of Human Genetics 86, 929) have been proposed as powerful alternatives to the standard Rao score test (Rao, 1948. Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. Mathematical Proceedings of the Cambridge Philosophical Society, 44, 50-57). The advantages of these EB-based tests are most apparent when the markers are correlated, due to the reduction in the degrees of freedom. In this paper, we propose an adaptive score test which up- or down-weights the contributions from each member of the marker-set based on the Z-scores of

  17. Individual Differences in Digit Span, Susceptibility to Proactive Interference, and Aptitude/Achievement Test Scores.

    Science.gov (United States)

    Dempster, Frank N.; Cooney, John B.

    1982-01-01

    Individual differences in digit span, susceptibility to proactive interference, and various aptitude/achievement test scores were investigated in two experiments with college students. Results indicated that digit span was strongly correlated with aptitude/achievement scores, but did not indicate that susceptibility to proactive interference…

  18. A general equation to obtain multiple cut-off scores on a test from multinomial logistic regression.

    Science.gov (United States)

    Bersabé, Rosa; Rivas, Teresa

    2010-05-01

    The authors derive a general equation to compute multiple cut-offs on a total test score in order to classify individuals into more than two ordinal categories. The equation is derived from the multinomial logistic regression (MLR) model, which is an extension of the binary logistic regression (BLR) model to accommodate polytomous outcome variables. From this analytical procedure, cut-off scores are established at the test score (the predictor variable) at which an individual is as likely to be in category j as in category j+1 of an ordinal outcome variable. The application of the complete procedure is illustrated by an example with data from an actual study on eating disorders. In this example, two cut-off scores on the Eating Attitudes Test (EAT-26) scores are obtained in order to classify individuals into three ordinal categories: asymptomatic, symptomatic and eating disorder. Diagnoses were made from the responses to a self-report (Q-EDD) that operationalises DSM-IV criteria for eating disorders. Alternatives to the MLR model to set multiple cut-off scores are discussed.

  19. Decision making under internal uncertainty: the case of multiple-choice tests with different scoring rules.

    Science.gov (United States)

    Bereby-Meyer, Yoella; Meyer, Joachim; Budescu, David V

    2003-02-01

    This paper assesses framing effects on decision making with internal uncertainty, i.e., partial knowledge, by focusing on examinees' behavior in multiple-choice (MC) tests with different scoring rules. In two experiments participants answered a general-knowledge MC test that consisted of 34 solvable and 6 unsolvable items. Experiment 1 studied two scoring rules involving Positive (only gains) and Negative (only losses) scores. Although answering all items was the dominating strategy for both rules, the results revealed a greater tendency to answer under the Negative scoring rule. These results are in line with the predictions derived from Prospect Theory (PT) [Econometrica 47 (1979) 263]. The second experiment studied two scoring rules, which allowed respondents to exhibit partial knowledge. Under the Inclusion-scoring rule the respondents mark all answers that could be correct, and under the Exclusion-scoring rule they exclude all answers that might be incorrect. As predicted by PT, respondents took more risks under the Inclusion rule than under the Exclusion rule. The results illustrate that the basic process that underlies choice behavior under internal uncertainty and especially the effect of framing is similar to the process of choice under external uncertainty and can be described quite accurately by PT. Copyright 2002 Elsevier Science B.V.

  20. External validation of the simple clinical score and the HOTEL score, two scores for predicting short-term mortality after admission to an acute medical unit.

    Science.gov (United States)

    Stræde, Mia; Brabrand, Mikkel

    2014-01-01

    Clinical scores can be of aid to predict early mortality after admission to a medical admission unit. A developed scoring system needs to be externally validated to minimise the risk of the discriminatory power and calibration to be falsely elevated. We performed the present study with the objective of validating the Simple Clinical Score (SCS) and the HOTEL score, two existing risk stratification systems that predict mortality for medical patients based solely on clinical information, but not only vital signs. Pre-planned prospective observational cohort study. Danish 460-bed regional teaching hospital. We included 3046 consecutive patients from 2 October 2008 until 19 February 2009. 26 (0.9%) died within one calendar day and 196 (6.4%) died within 30 days. We calculated SCS for 1080 patients. We found an AUROC of 0.960 (95% confidence interval [CI], 0.932 to 0.988) for 24-hours mortality and 0.826 (95% CI, 0.774-0.879) for 30-day mortality, and goodness-of-fit test, χ(2) = 2.68 (10 degrees of freedom), P = 0.998 and χ(2) = 4.00, P = 0.947, respectively. We included 1470 patients when calculating the HOTEL score. Discriminatory power (AUROC) was 0.931 (95% CI, 0.901-0.962) for 24-hours mortality and goodness-of-fit test, χ(2) = 5.56 (10 degrees of freedom), P = 0.234. We find that both the SCS and HOTEL scores showed an excellent to outstanding ability in identifying patients at high risk of dying with good or acceptable precision.

  1. A quantitative assessment of alkaptonuria: testing the reliability of two disease severity scoring systems.

    Science.gov (United States)

    Cox, Trevor F; Ranganath, Lakshminarayan

    2011-12-01

    Alkaptonuria (AKU) is due to excessive homogentisic acid accumulation in body fluids due to lack of enzyme homogentisate dioxygenase leading in turn to varied clinical manifestations mainly by a process of conversion of HGA to a polymeric melanin-like pigment known as ochronosis. A potential treatment, a drug called nitisinone, to decrease formation of HGA is available. However, successful demonstration of its efficacy in modifying the natural history of AKU requires an effective quantitative assessment tool. We have described two potential tools that could be used to quantitate disease burden in AKU. One tool describes scoring the clinical features that includes clinical assessments, investigations and questionnaires in 15 patients with AKU. The second tool describes a scoring system that only includes items obtained from questionnaires used in 44 people with AKU. Statistical analyses were carried out on the two patient datasets to assess the AKU tools; these included the calculation of Chronbach's alpha, multidimensional scaling and simple linear regression analysis. The conclusion was that there was good evidence that the tools could be adopted as AKU assessment tools, but perhaps with further refinement before being used in the practical setting of a clinical trial.

  2. An examination of the RCMAS-2 scores across gender, ethnic background, and age in a large Asian school sample.

    Science.gov (United States)

    Ang, Rebecca P; Lowe, Patricia A; Yusof, Noradlin

    2011-12-01

    The present study investigated the factor structure, reliability, convergent and discriminant validity, and U.S. norms of the Revised Children's Manifest Anxiety Scale, Second Edition (RCMAS-2; C. R. Reynolds & B. O. Richmond, 2008a) scores in a Singapore sample of 1,618 school-age children and adolescents. Although there were small statistically significant differences in the average RCMAS-2 T scores found across various demographic groupings, on the whole, the U.S. norms appear adequate for use in the Asian Singapore sample. Results from item bias analyses suggested that biased items detected had small effects and were counterbalanced across gender and ethnicity, and hence, their relative impact on test score variation appears to be minimal. Results of factor analyses on the RCMAS-2 scores supported the presence of a large general anxiety factor, the Total Anxiety factor, and the 5-factor structure found in U.S. samples was replicated. Both the large general anxiety factor and the 5-factor solution were invariant across gender and ethnic background. Internal consistency estimates ranged from adequate to good, and 2-week test-retest reliability estimates were comparable to previous studies. Evidence providing support for convergent and discriminant validity of the RCMAS-2 scores was also found. Taken together, findings provide additional cross-cultural evidence of the appropriateness and usefulness of the RCMAS-2 as a measure of anxiety in Asian Singaporean school-age children and adolescents.

  3. The Mann-Whitney U: A Test for Assessing Whether Two Independent Samples Come from the Same Distribution

    Directory of Open Access Journals (Sweden)

    Nadim Nachar

    2008-03-01

    Full Text Available It is often difficult, particularly when conducting research in psychology, to have access to large normally distributed samples. Fortunately, there are statistical tests to compare two independent groups that do not require large normally distributed samples. The Mann-Whitney U is one of these tests. In the following work, a summary of this test is presented. The explanation of the logic underlying this test and its application are presented. Moreover, the forces and weaknesses of the Mann-Whitney U are mentioned. One major limit of the Mann-Whitney U is that the type I error or alpha (? is amplified in a situation of heteroscedasticity.

  4. Improving personality facet scores with multidimensional computer adaptive testing

    DEFF Research Database (Denmark)

    Makransky, Guido; Mortensen, Erik Lykke; Glas, Cees A W

    2013-01-01

    personality tests contain many highly correlated facets. This article investigates the possibility of increasing the precision of the NEO PI-R facet scores by scoring items with multidimensional item response theory and by efficiently administering and scoring items with multidimensional computer adaptive...

  5. Reporting Diagnostic Scores in Educational Testing: Temptations, Pitfalls, and Some Solutions

    Science.gov (United States)

    Sinharay, Sandip; Puhan, Gautam; Haberman, Shelby J.

    2010-01-01

    Diagnostic scores are of increasing interest in educational testing due to their potential remedial and instructional benefit. Naturally, the number of educational tests that report diagnostic scores is on the rise, as are the number of research publications on such scores. This article provides a critical evaluation of diagnostic score reporting…

  6. Testing statistical significance scores of sequence comparison methods with structure similarity

    Directory of Open Access Journals (Sweden)

    Leunissen Jack AM

    2006-10-01

    Full Text Available Abstract Background In the past years the Smith-Waterman sequence comparison algorithm has gained popularity due to improved implementations and rapidly increasing computing power. However, the quality and sensitivity of a database search is not only determined by the algorithm but also by the statistical significance testing for an alignment. The e-value is the most commonly used statistical validation method for sequence database searching. The CluSTr database and the Protein World database have been created using an alternative statistical significance test: a Z-score based on Monte-Carlo statistics. Several papers have described the superiority of the Z-score as compared to the e-value, using simulated data. We were interested if this could be validated when applied to existing, evolutionary related protein sequences. Results All experiments are performed on the ASTRAL SCOP database. The Smith-Waterman sequence comparison algorithm with both e-value and Z-score statistics is evaluated, using ROC, CVE and AP measures. The BLAST and FASTA algorithms are used as reference. We find that two out of three Smith-Waterman implementations with e-value are better at predicting structural similarities between proteins than the Smith-Waterman implementation with Z-score. SSEARCH especially has very high scores. Conclusion The compute intensive Z-score does not have a clear advantage over the e-value. The Smith-Waterman implementations give generally better results than their heuristic counterparts. We recommend using the SSEARCH algorithm combined with e-values for pairwise sequence comparisons.

  7. Factor structure and invariance test of the alcohol use disorder identification test (AUDIT): Comparison and further validation in a U.S. and Philippines college student sample.

    Science.gov (United States)

    Tuliao, Antover P; Landoy, Bernice Vania N; McChargue, Dennis E

    2016-01-01

    The Alcohol Use Disorder Identification Test's factor structure varies depending on population and culture. Because of this inconsistency, this article examined the factor structure of the test and conducted a factorial invariance test between a U.S. and a Philippines college sample. Confirmatory factor analyses indicated that a three-factor solution outperforms the one- and two-factor solution in both samples. Factorial invariance analyses further supports the confirmatory findings by showing that factor loadings were generally invariant across groups; however, item intercepts show non-invariance. Country differences between factors show that Filipino consumption factor mean scores were significantly lower than their U.S. counterparts.

  8. The Effects of Video Game Experience on Computer-Based Air Traffic Controller Specialist, Air Traffic Scenario Test Scores.

    Science.gov (United States)

    1997-02-01

    application with a strong resemblance to a video game , concern has been raised that prior video game experience might have a moderating effect on scores. Much...such as spatial ability. The effects of computer or video game experience on work sample scores have not been systematically investigated. The purpose...of this study was to evaluate the incremental validity of prior video game experience over that of general aptitude as a predictor of work sample test

  9. The two-sample problem with induced dependent censorship.

    Science.gov (United States)

    Huang, Y

    1999-12-01

    Induced dependent censorship is a general phenomenon in health service evaluation studies in which a measure such as quality-adjusted survival time or lifetime medical cost is of interest. We investigate the two-sample problem and propose two classes of nonparametric tests. Based on consistent estimation of the survival function for each sample, the two classes of test statistics examine the cumulative weighted difference in hazard functions and in survival functions. We derive a unified asymptotic null distribution theory and inference procedure. The tests are applied to trial V of the International Breast Cancer Study Group and show that long duration chemotherapy significantly improves time without symptoms of disease and toxicity of treatment as compared with the short duration treatment. Simulation studies demonstrate that the proposed tests, with a wide range of weight choices, perform well under moderate sample sizes.

  10. Two sampling techniques for game meat

    Directory of Open Access Journals (Sweden)

    Maretha van der Merwe

    2013-03-01

    Full Text Available A study was conducted to compare the excision sampling technique used by the export market and the sampling technique preferred by European countries, namely the biotrace cattle and swine test. The measuring unit for the excision sampling was grams (g and square centimetres (cm2 for the swabbing technique. The two techniques were compared after a pilot test was conducted on spiked approved beef carcasses (n = 12 that statistically proved the two measuring units correlated. The two sampling techniques were conducted on the same game carcasses (n = 13 and analyses performed for aerobic plate count (APC, Escherichia coli and Staphylococcus aureus, for both techniques. A more representative result was obtained by swabbing and no damage was caused to the carcass. Conversely, the excision technique yielded fewer organisms and caused minor damage to the carcass. The recovery ratio from the sampling technique improved 5.4 times for APC, 108.0 times for E. coli and 3.4 times for S. aureus over the results obtained from the excision technique. It was concluded that the sampling methods of excision and swabbing can be used to obtain bacterial profiles from both export and local carcasses and could be used to indicate whether game carcasses intended for the local market are possibly on par with game carcasses intended for the export market and therefore safe for human consumption.

  11. Two sampling techniques for game meat.

    Science.gov (United States)

    van der Merwe, Maretha; Jooste, Piet J; Hoffman, Louw C; Calitz, Frikkie J

    2013-03-20

    A study was conducted to compare the excision sampling technique used by the export market and the sampling technique preferred by European countries, namely the biotrace cattle and swine test. The measuring unit for the excision sampling was grams (g) and square centimetres (cm2) for the swabbing technique. The two techniques were compared after a pilot test was conducted on spiked approved beef carcasses (n = 12) that statistically proved the two measuring units correlated. The two sampling techniques were conducted on the same game carcasses (n = 13) and analyses performed for aerobic plate count (APC), Escherichia coli and Staphylococcus aureus, for both techniques. A more representative result was obtained by swabbing and no damage was caused to the carcass. Conversely, the excision technique yielded fewer organisms and caused minor damage to the carcass. The recovery ratio from the sampling technique improved 5.4 times for APC, 108.0 times for E. coli and 3.4 times for S. aureus over the results obtained from the excision technique. It was concluded that the sampling methods of excision and swabbing can be used to obtain bacterial profiles from both export and local carcasses and could be used to indicate whether game carcasses intended for the local market are possibly on par with game carcasses intended for the export market and therefore safe for human consumption.

  12. Validity and reliability of Abbreviated Mental Test Score (AMTS) among older Iranian.

    Science.gov (United States)

    Foroughan, Mahshid; Wahlund, Lars-Olof; Jafari, Zahra; Rahgozar, Mehdi; Farahani, Ida G; Rashedi, Vahid

    2017-11-01

    Cognitive impairment is common among older people and is associated with increased morbidity and mortality. The main aim of this study was to evaluate the validity of the Persian version of the Abbreviated Mental Test Score (AMTS) as a screening tool for dementia. Data were obtained from a cross-sectional study. One hundred and one older adults who were members of Iranian Alzheimer Association and 101 of their siblings were entered into this study by convenient sampling. The Diagnostic and Statistical Manual of Mental Disorders, 4th edition, criteria for diagnosing dementia and the Mini-Mental State Examination were used as the study tools. The gathered data were analyzed by the Mann-Whitney U-test, the Kruskal-Wallis test, Spearman's rank correlation coefficient, and the receiver-operating characteristic. The AMTS could successfully differentiate the dementia group from the non-dementia group. Scores were significantly correlated with Diagnostic and Statistical Manual of Mental Disorders diagnosis for dementia and Mini-Mental State Examination scores (P < 0.001). Educational level (P < 0.001) and male sex (P = 0.015) were positively associated with AMTS, whereas (P < 0.001) was negatively associated with AMTS. Total Cronbach's α coefficient was 0.90. The scores 6 and 7 showed the optimum balance between sensitivity (99% and 94%, respectively) and specificity (85% and 86%, respectively). The Persian version of the AMTS is a valid cognitive assessment tool for older Iranian adults and can be used for dementia screening in Iran. © 2017 Japanese Psychogeriatric Society.

  13. A process dissociation approach to objective-projective test score interrelationships.

    Science.gov (United States)

    Bornstein, Robert F

    2002-02-01

    Even when self-report and projective measures of a given trait or motive both predict theoretically related features of behavior, scores on the 2 tests correlate modestly with each other. This article describes a process dissociation framework for personality assessment, derived from research on implicit memory and learning, which can resolve these ostensibly conflicting results. Research on interpersonal dependency is used to illustrate 3 key steps in the process dissociation approach: (a) converging behavioral predictions, (b) modest test score intercorrelations, and (c) delineation of variables that differentially affect self-report and projective test scores. Implications of the process dissociation framework for personality assessment and test development are discussed.

  14. Sample size calculation to externally validate scoring systems based on logistic regression models.

    Directory of Open Access Journals (Sweden)

    Antonio Palazón-Bru

    Full Text Available A sample size containing at least 100 events and 100 non-events has been suggested to validate a predictive model, regardless of the model being validated and that certain factors can influence calibration of the predictive model (discrimination, parameterization and incidence. Scoring systems based on binary logistic regression models are a specific type of predictive model.The aim of this study was to develop an algorithm to determine the sample size for validating a scoring system based on a binary logistic regression model and to apply it to a case study.The algorithm was based on bootstrap samples in which the area under the ROC curve, the observed event probabilities through smooth curves, and a measure to determine the lack of calibration (estimated calibration index were calculated. To illustrate its use for interested researchers, the algorithm was applied to a scoring system, based on a binary logistic regression model, to determine mortality in intensive care units.In the case study provided, the algorithm obtained a sample size with 69 events, which is lower than the value suggested in the literature.An algorithm is provided for finding the appropriate sample size to validate scoring systems based on binary logistic regression models. This could be applied to determine the sample size in other similar cases.

  15. A prognostic scoring system for arm exercise stress testing.

    Science.gov (United States)

    Xie, Yan; Xian, Hong; Chandiramani, Pooja; Bainter, Emily; Wan, Leping; Martin, Wade H

    2016-01-01

    Arm exercise stress testing may be an equivalent or better predictor of mortality outcome than pharmacological stress imaging for the ≥50% for patients unable to perform leg exercise. Thus, our objective was to develop an arm exercise ECG stress test scoring system, analogous to the Duke Treadmill Score, for predicting outcome in these individuals. In this retrospective observational cohort study, arm exercise ECG stress tests were performed in 443 consecutive veterans aged 64.1 (11.1) years. (mean (SD)) between 1997 and 2002. From multivariate Cox models, arm exercise scores were developed for prediction of 5-year and 12-year all-cause and cardiovascular mortality and 5-year cardiovascular mortality or myocardial infarction (MI). Arm exercise capacity in resting metabolic equivalents (METs), 1 min heart rate recovery (HRR) and ST segment depression ≥1 mm were the stress test variables independently associated with all-cause and cardiovascular mortality by step-wise Cox analysis (all pstatistic of 0.81 before and 0.88 after adjustment for significant demographic and clinical covariates. Arm exercise scores for the other outcome end points yielded C-statistic values of 0.77-0.79 before and 0.82-0.86 after adjustment for significant covariates versus 0.64-0.72 for best fit pharmacological myocardial perfusion imaging models in a cohort of 1730 veterans who were evaluated over the same time period. Arm exercise scores, analogous to the Duke Treadmill Score, have good power for prediction of mortality or MI in patients who cannot perform leg exercise.

  16. Graphics for the multivariate two-sample problem

    International Nuclear Information System (INIS)

    Friedman, J.H.; Rafsky, L.C.

    1981-01-01

    Some graphical methods for comparing multivariate samples are presented. These methods are based on minimal spanning tree techniques developed for multivariate two-sample tests. The utility of these methods is illustrated through examples using both real and artificial data

  17. ANOVA Analysis of Student Daily Test Scores in Multi-Day Test Periods

    Science.gov (United States)

    Mouritsen, Matthew L.; Davis, Jefferson T.; Jones, Steven C.

    2016-01-01

    Instructors are often concerned when giving multiple-day tests because students taking the test later in the exam period may have an advantage over students taking the test early in the exam period due to information leakage. However, exam scores seemed to decline as students took the same test later in a multi-day exam period (Mouritsen and…

  18. The Effect of Pretest Exercise on Baseline Computerized Neurocognitive Test Scores.

    Science.gov (United States)

    Pawlukiewicz, Alec; Yengo-Kahn, Aaron M; Solomon, Gary

    2017-10-01

    Baseline neurocognitive assessment plays a critical role in return-to-play decision making following sport-related concussions. Prior studies have assessed the effect of a variety of modifying factors on neurocognitive baseline test scores. However, relatively little investigation has been conducted regarding the effect of pretest exercise on baseline testing. The aim of our investigation was to determine the effect of pretest exercise on baseline Immediate Post-Concussion Assessment and Cognitive Testing (ImPACT) scores in adolescent and young adult athletes. We hypothesized that athletes undergoing self-reported strenuous exercise within 3 hours of baseline testing would perform more poorly on neurocognitive metrics and would report a greater number of symptoms than those who had not completed such exercise. Cross-sectional study; Level of evidence, 3. The ImPACT records of 18,245 adolescent and young adult athletes were retrospectively analyzed. After application of inclusion and exclusion criteria, participants were dichotomized into groups based on a positive (n = 664) or negative (n = 6609) self-reported history of strenuous exercise within 3 hours of the baseline test. Participants with a positive history of exercise were then randomly matched, based on age, sex, education level, concussion history, and hours of sleep prior to testing, on a 1:2 basis with individuals who had reported no pretest exercise. The baseline ImPACT composite scores of the 2 groups were then compared. Significant differences were observed for the ImPACT composite scores of verbal memory, visual memory, reaction time, and impulse control as well as for the total symptom score. No significant between-group difference was detected for the visual motor composite score. Furthermore, pretest exercise was associated with a significant increase in the overall frequency of invalid test results. Our results suggest a statistically significant difference in ImPACT composite scores between

  19. Robust joint score tests in the application of DNA methylation data analysis.

    Science.gov (United States)

    Li, Xuan; Fu, Yuejiao; Wang, Xiaogang; Qiu, Weiliang

    2018-05-18

    Recently differential variability has been showed to be valuable in evaluating the association of DNA methylation to the risks of complex human diseases. The statistical tests based on both differential methylation level and differential variability can be more powerful than those based only on differential methylation level. Anh and Wang (2013) proposed a joint score test (AW) to simultaneously detect for differential methylation and differential variability. However, AW's method seems to be quite conservative and has not been fully compared with existing joint tests. We proposed three improved joint score tests, namely iAW.Lev, iAW.BF, and iAW.TM, and have made extensive comparisons with the joint likelihood ratio test (jointLRT), the Kolmogorov-Smirnov (KS) test, and the AW test. Systematic simulation studies showed that: 1) the three improved tests performed better (i.e., having larger power, while keeping nominal Type I error rates) than the other three tests for data with outliers and having different variances between cases and controls; 2) for data from normal distributions, the three improved tests had slightly lower power than jointLRT and AW. The analyses of two Illumina HumanMethylation27 data sets GSE37020 and GSE20080 and one Illumina Infinium MethylationEPIC data set GSE107080 demonstrated that three improved tests had higher true validation rates than those from jointLRT, KS, and AW. The three proposed joint score tests are robust against the violation of normality assumption and presence of outlying observations in comparison with other three existing tests. Among the three proposed tests, iAW.BF seems to be the most robust and effective one for all simulated scenarios and also in real data analyses.

  20. Are two systemic fish assemblage sampling programmes on the upper Mississippi River telling us the same thing?

    Science.gov (United States)

    Dukerschein, J.T.; Bartels, A.D.; Ickes, B.S.; Pearson, M.S.

    2013-01-01

    We applied an Index of Biotic Integrity (IBI) used on Wisconsin/Minnesota waters of the upper Mississippi River (UMR) to compare data from two systemic sampling programmes. Ability to use data from multiple sampling programmes could extend spatial and temporal coverage of river assessment and monitoring efforts. We normalized for effort and tested fish community data collected by the Environmental Monitoring and Assessment Program-Great Rivers Ecosystems (EMAP-GRE) 2004–2006 and the Long Term Resource Monitoring Program (LTRMP) 1993–2006. Each programme used daytime electrofishing along main channel borders but with some methodological and design differences. EMAP-GRE, designed for baseline and, eventually, compliance monitoring, used a probabilistic, continuous design. LTRMP, designed primarily for baseline and trend monitoring, used a stratified random design in five discrete study reaches. Analysis of similarity indicated no significant difference between EMAP-GRE and LTRMP IBI scores (n=238; Global R= 0.052; significance level=0.972). Both datasets distinguished clear differences only between 'Fair' and 'Poor' condition categories, potentially supporting a 'pass–fail' assessment strategy. Thirteen years of LTRMP data demonstrated stable IBI scores through time in four of five reaches sampled. LTRMP and EMAPGRE IBI scores correlated along the UMR's upstream to downstream gradient (df [3, 25]; F=1.61; p=0.22). A decline in IBI scores from upstream to downstream was consistent with UMR fish community studies and a previous, empirically modelled human disturbance gradient. Comparability between EMAP-GRE (best upstream to downstream coverage) and LTRMP data (best coverage over time and across the floodplain) supports a next step of developing and testing a systemic, multi-metric fish index on the UMR that both approaches could inform.

  1. The Effect of Mock Tests on Iranian EFL learners’ Test Scores

    OpenAIRE

    Hossein Khodabakhshzadeh; Reza Zardkanloo

    2016-01-01

    The effect of using tests in test preparation courses has been subject to debate. While some scholars such as Yang and Badger (2015) believe it is a cause of positive washback effect, others argue that this issue is tentative and context-bound (Green, 2007). Therefore, this study investigated the effect of using Mock tests in International English Language Testing System (IELTS) preparation courses on students’ overall IELTS scores. Fifty one IELTS students were selected non-randomly through ...

  2. Biering-Sorensen test scores in coal miners

    Energy Technology Data Exchange (ETDEWEB)

    Tekin, Y.; Ortancil, O.; Ankarali, H.; Basaran, A.; Sarikaya, S.; Ozdolap, S. [Zonguldak Karaelmas University, Zonguldak (Turkey)

    2009-05-15

    Biering-Sorensen test is an isometric back endurance test. Biering-Sorensen test scores have varied in different cultural and occupational groups. The aims of this study were to collect normative data on Biering-Sorensen holding times, to determine the discriminative ability of the Biering-Sorensen test in Turkish coal miners, and to examine the association between Biering-Sorensen test result and functional disability. One hundred and fifty male coal miners participated in this study. Trunk extensor muscle strength was measured using the Biering-Sorensen test. Oswestry disability index was used to measure the functional disability level of low back pain. The mean Biering-Sorensen holding time for the total subject group was 107.3 {+-} 22.5 s. The mean time of Biering-Sorensen test of the subjects with and without low back pain were 99.9 {+-} 19.8 and 128.6 {+-} 15.2 s, respectively. The difference between the subjects with and without low back pain was statistically significant (p < 0.001). There was a statistically significant negative correlation between Oswestry functional disability score and Biering-Sorensen holding time (R = -0.824, p < 0.001). Turkish coal miners have low mean back extensor endurance holding times. Biering-Sorensen test had a good discriminative ability in our study group. Trunk muscle strength has a significant effect on the disability level of low back pain. Thus trunk muscle endurance training exercise therapy may be effective for the reduction of disability in patients with low back pain.

  3. Cognitive disparities, lead plumbing, and water chemistry: prior exposure to water-borne lead and intelligence test scores among World War Two U.S. Army enlistees.

    Science.gov (United States)

    Ferrie, Joseph P; Rolf, Karen; Troesken, Werner

    2012-01-01

    Higher prior exposure to water-borne lead among male World War Two U.S. Army enlistees was associated with lower intelligence test scores. Exposure was proxied by urban residence and the water pH levels of the cities where enlistees lived in 1930. Army General Classification Test scores were six points lower (nearly 1/3 standard deviation) where pH was 6 (so the water lead concentration for a given amount of lead piping was higher) than where pH was 7 (so the concentration was lower). This difference rose with time exposed. At this time, the dangers of exposure to lead in water were not widely known and lead was ubiquitous in water systems, so these results are not likely the effect of individuals selecting into locations with different levels of exposure. Copyright © 2011 Elsevier B.V. All rights reserved.

  4. Assessing Exhaustiveness of Stochastic Sampling for Integrative Modeling of Macromolecular Structures.

    Science.gov (United States)

    Viswanath, Shruthi; Chemmama, Ilan E; Cimermancic, Peter; Sali, Andrej

    2017-12-05

    Modeling of macromolecular structures involves structural sampling guided by a scoring function, resulting in an ensemble of good-scoring models. By necessity, the sampling is often stochastic, and must be exhaustive at a precision sufficient for accurate modeling and assessment of model uncertainty. Therefore, the very first step in analyzing the ensemble is an estimation of the highest precision at which the sampling is exhaustive. Here, we present an objective and automated method for this task. As a proxy for sampling exhaustiveness, we evaluate whether two independently and stochastically generated sets of models are sufficiently similar. The protocol includes testing 1) convergence of the model score, 2) whether model scores for the two samples were drawn from the same parent distribution, 3) whether each structural cluster includes models from each sample proportionally to its size, and 4) whether there is sufficient structural similarity between the two model samples in each cluster. The evaluation also provides the sampling precision, defined as the smallest clustering threshold that satisfies the third, most stringent test. We validate the protocol with the aid of enumerated good-scoring models for five illustrative cases of binary protein complexes. Passing the proposed four tests is necessary, but not sufficient for thorough sampling. The protocol is general in nature and can be applied to the stochastic sampling of any set of models, not just structural models. In addition, the tests can be used to stop stochastic sampling as soon as exhaustiveness at desired precision is reached, thereby improving sampling efficiency; they may also help in selecting a model representation that is sufficiently detailed to be informative, yet also sufficiently coarse for sampling to be exhaustive. Copyright © 2017 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  5. A score based on screening tests to differentiate mild cognitive impairment from subjective memory complaints

    Directory of Open Access Journals (Sweden)

    Fábio Henrique de Gobbi Porto

    2013-09-01

    Full Text Available It is not easy to differentiate patients with mild cognitive impairment (MCI from subjective memory complainers (SMC. Assessments with screening cognitive tools are essential, particularly in primary care where most patients are seen. The objective of this study was to evaluate the diagnostic accuracy of screening cognitive tests and to propose a score derived from screening tests. Elderly subjects with memory complaints were evaluated using the Mini Mental State Examination (MMSE and the Brief Cognitive Battery (BCB. We added two delayed recalls in the MMSE (a delayed recall and a late-delayed recall, LDR, and also a phonemic fluency test of letter P fluency (LPF. A score was created based on these tests. The diagnoses were made on the basis of clinical consensus and neuropsychological testing. Receiver operating characteristic curve analyses were used to determine area under the curve (AUC, the sensitivity and specificity for each test separately and for the final proposed score. MMSE, LDR, LPF and delayed recall of BCB scores reach statistically significant differences between groups (P=0.000, 0.03, 0.001 and 0.01, respectively. Sensitivity, specificity and AUC were MMSE: 64%, 79% and 0.75 (cut off <29; LDR: 56%, 62% and 0.62 (cut off <3; LPF: 71%, 71% and 0.71 (cut off <14; delayed recall of BCB: 56%, 82% and 0.68 (cut off <9. The proposed score reached a sensitivity of 88% and 76% and specificity of 62% and 75% for cut off over 1 and over 2, respectively. AUC were 0.81. In conclusion, a score created from screening tests is capable of discriminating MCI from SMC with moderate to good accurancy.

  6. Conclusion of LOD-score analysis for family data generated under two-locus models.

    Science.gov (United States)

    Dizier, M H; Babron, M C; Clerget-Darpoux, F

    1996-06-01

    The power to detect linkage by the LOD-score method is investigated here for diseases that depend on the effects of two genes. The classical strategy is, first, to detect a major-gene (MG) effect by segregation analysis and, second, to seek for linkage with genetic markers by the LOD-score method using the MG parameters. We already showed that segregation analysis can lead to evidence for a MG effect for many two-locus models, with the estimates of the MG parameters being very different from those of the two genes involved in the disease. We show here that use of these MG parameter estimates in the LOD-score analysis may lead to a failure to detect linkage for some two-locus models. For these models, use of the sib-pair method gives a non-negligible increase of power to detect linkage. The linkage-homogeneity test among subsamples differing for the familial disease distribution provides evidence of parameter misspecification, when the MG parameters are used. Moreover, for most of the models, use of the MG parameters in LOD-score analysis leads to a large bias in estimation of the recombination fraction and sometimes also to a rejection of linkage for the true recombination fraction. A final important point is that a strong evidence of an MG effect, obtained by segregation analysis, does not necessarily imply that linkage will be detected for at least one of the two genes, even with the true parameters and with a close informative marker.

  7. Conclusions of LOD-score analysis for family data generated under two-locus models

    Energy Technology Data Exchange (ETDEWEB)

    Dizier, M.H.; Babron, M.C.; Clergt-Darpoux, F. [Unite de Recherches d`Epidemiologie Genetique, Paris (France)

    1996-06-01

    The power to detect linkage by the LOD-score method is investigated here for diseases that depend on the effects of two genes. The classical strategy is, first, to detect a major-gene (MG) effect by segregation analysis and, second, to seek for linkage with genetic markers by the LOD-score method using the MG parameters. We already showed that segregation analysis can lead to evidence for a MG effect for many two-locus models, with the estimates of the MG parameters being very different from those of the two genes involved in the disease. We show here that use of these MG parameter estimates in the LOD-score analysis may lead to a failure to detect linkage for some two-locus models. For these models, use of the sib-pair method gives a non-negligible increase of power to detect linkage. The linkage-homogeneity test among subsamples differing for the familial disease distribution provides evidence of parameter misspecification, when the MG parameters are used. Moreover, for most of the models, use of the MG parameters in LOD-score analysis leads to a large bias in estimation of the recombination fraction and sometimes also to a rejection of linkage for the true recombination fraction. A final important point is that a strong evidence of an MG effect, obtained by segregation analysis, does not necessarily imply that linkage will be detected for at least one of the two genes, even with the true parameters and with a close informative marker. 17 refs., 3 tabs.

  8. Inconsistencies between alcohol screening results based on AUDIT-C scores and reported drinking on the AUDIT-C questions: prevalence in two US national samples

    Science.gov (United States)

    2014-01-01

    Background The AUDIT-C is an extensively validated screen for unhealthy alcohol use (i.e. drinking above recommended limits or alcohol use disorder), which consists of three questions about alcohol consumption. AUDIT-C scores ≥4 points for men and ≥3 for women are considered positive screens based on US validation studies that compared the AUDIT-C to “gold standard” measures of unhealthy alcohol use from independent, detailed interviews. However, results of screening—positive or negative based on AUDIT-C scores—can be inconsistent with reported drinking on the AUDIT-C questions. For example, individuals can screen positive based on the AUDIT-C score while reporting drinking below US recommended limits on the same AUDIT-C. Alternatively, they can screen negative based on the AUDIT-C score while reporting drinking above US recommended limits. Such inconsistencies could complicate interpretation of screening results, but it is unclear how often they occur in practice. Methods This study used AUDIT-C data from respondents who reported past-year drinking on one of two national US surveys: a general population survey (N = 26,610) and a Veterans Health Administration (VA) outpatient survey (N = 467,416). Gender-stratified analyses estimated the prevalence of AUDIT-C screen results—positive or negative screens based on the AUDIT-C score—that were inconsistent with reported drinking (above or below US recommended limits) on the same AUDIT-C. Results Among men who reported drinking, 13.8% and 21.1% of US general population and VA samples, respectively, had screening results based on AUDIT-C scores (positive or negative) that were inconsistent with reported drinking on the AUDIT-C questions (above or below US recommended limits). Among women who reported drinking, 18.3% and 20.7% of US general population and VA samples, respectively, had screening results that were inconsistent with reported drinking. Limitations This study did not include an

  9. Gender, Stereotype Threat and Mathematics Test Scores

    OpenAIRE

    Ming Tsui; Xiao Y. Xu; Edmond Venator

    2011-01-01

    Problem statement: Stereotype threat has repeatedly been shown to depress womens scores on difficult math tests. An attempt to replicate these findings in China found no support for the stereotype threat hypothesis. Our math test was characterized as being personally important for the student participants, an atypical condition in most stereotype threat laboratory research. Approach: To evaluate the effects of this personal demand, we conducted three experiments. Results: ...

  10. The Five-Factor Narcissism Inventory (FFNI): a test of the convergent, discriminant, and incremental validity of FFNI scores in clinical and community samples.

    Science.gov (United States)

    Miller, Joshua D; Few, Lauren R; Wilson, Lauren; Gentile, Brittany; Widiger, Thomas A; Mackillop, James; Keith Campbell, W

    2013-09-01

    The five-factor narcissism inventory (FFNI) is a new self-report measure that was developed to assess traits associated with narcissistic personality disorder (NPD), as well as grandiose and vulnerable narcissism from a five-factor model (FFM) perspective. In the current study, the FFNI was examined in relation to Diagnostic and Statistical Manual of Mental Disorders (4th ed., text rev.; DSM-IV; American Psychiatric Association, 2000) NPD, DSM-5 (http://www.dsm5.org) NPD traits, grandiose narcissism, and vulnerable narcissism in both community (N = 287) and clinical samples (N = 98). Across the samples, the FFNI scales manifested good convergent and discriminant validity such that FFNI scales derived from FFM neuroticism were primarily related to vulnerable narcissism scores, scales derived from FFM extraversion were primarily related to grandiose scores, and FFNI scales derived from FFM agreeableness were related to both narcissism dimensions, as well as the DSM-IV and DSM-5 NPD scores. The FFNI grandiose and vulnerable narcissism composites also demonstrated incremental validity in the statistical prediction of these scores, above and beyond existing measures of DSM NPD, grandiose narcissism, and vulnerable narcissism, respectively. The FFNI is a promising measure that provides a comprehensive assessment of narcissistic pathology while maintaining ties to the significant general personality literature on the FFM.

  11. Polygenic scores predict alcohol problems in an independent sample and show moderation by the environment.

    Science.gov (United States)

    Salvatore, Jessica E; Aliev, Fazil; Edwards, Alexis C; Evans, David M; Macleod, John; Hickman, Matthew; Lewis, Glyn; Kendler, Kenneth S; Loukola, Anu; Korhonen, Tellervo; Latvala, Antti; Rose, Richard J; Kaprio, Jaakko; Dick, Danielle M

    2014-04-10

    Alcohol problems represent a classic example of a complex behavioral outcome that is likely influenced by many genes of small effect. A polygenic approach, which examines aggregate measured genetic effects, can have predictive power in cases where individual genes or genetic variants do not. In the current study, we first tested whether polygenic risk for alcohol problems-derived from genome-wide association estimates of an alcohol problems factor score from the age 18 assessment of the Avon Longitudinal Study of Parents and Children (ALSPAC; n = 4304 individuals of European descent; 57% female)-predicted alcohol problems earlier in development (age 14) in an independent sample (FinnTwin12; n = 1162; 53% female). We then tested whether environmental factors (parental knowledge and peer deviance) moderated polygenic risk to predict alcohol problems in the FinnTwin12 sample. We found evidence for both polygenic association and for additive polygene-environment interaction. Higher polygenic scores predicted a greater number of alcohol problems (range of Pearson partial correlations 0.07-0.08, all p-values ≤ 0.01). Moreover, genetic influences were significantly more pronounced under conditions of low parental knowledge or high peer deviance (unstandardized regression coefficients (b), p-values (p), and percent of variance (R2) accounted for by interaction terms: b = 1.54, p = 0.02, R2 = 0.33%; b = 0.94, p = 0.04, R2 = 0.30%, respectively). Supplementary set-based analyses indicated that the individual top single nucleotide polymorphisms (SNPs) contributing to the polygenic scores were not individually enriched for gene-environment interaction. Although the magnitude of the observed effects are small, this study illustrates the usefulness of polygenic approaches for understanding the pathways by which measured genetic predispositions come together with environmental factors to predict complex behavioral outcomes.

  12. Polygenic Scores Predict Alcohol Problems in an Independent Sample and Show Moderation by the Environment

    Directory of Open Access Journals (Sweden)

    Jessica E. Salvatore

    2014-04-01

    Full Text Available Alcohol problems represent a classic example of a complex behavioral outcome that is likely influenced by many genes of small effect. A polygenic approach, which examines aggregate measured genetic effects, can have predictive power in cases where individual genes or genetic variants do not. In the current study, we first tested whether polygenic risk for alcohol problems—derived from genome-wide association estimates of an alcohol problems factor score from the age 18 assessment of the Avon Longitudinal Study of Parents and Children (ALSPAC; n = 4304 individuals of European descent; 57% female—predicted alcohol problems earlier in development (age 14 in an independent sample (FinnTwin12; n = 1162; 53% female. We then tested whether environmental factors (parental knowledge and peer deviance moderated polygenic risk to predict alcohol problems in the FinnTwin12 sample. We found evidence for both polygenic association and for additive polygene-environment interaction. Higher polygenic scores predicted a greater number of alcohol problems (range of Pearson partial correlations 0.07–0.08, all p-values ≤ 0.01. Moreover, genetic influences were significantly more pronounced under conditions of low parental knowledge or high peer deviance (unstandardized regression coefficients (b, p-values (p, and percent of variance (R2 accounted for by interaction terms: b = 1.54, p = 0.02, R2 = 0.33%; b = 0.94, p = 0.04, R2 = 0.30%, respectively. Supplementary set-based analyses indicated that the individual top single nucleotide polymorphisms (SNPs contributing to the polygenic scores were not individually enriched for gene-environment interaction. Although the magnitude of the observed effects are small, this study illustrates the usefulness of polygenic approaches for understanding the pathways by which measured genetic predispositions come together with environmental factors to predict complex behavioral outcomes.

  13. Rational Arithmetic Mathematica Functions to Evaluate the Two-Sided One Sample K-S Cumulative Sampling Distribution

    Directory of Open Access Journals (Sweden)

    J. Randall Brown

    2007-06-01

    Full Text Available One of the most widely used goodness-of-fit tests is the two-sided one sample Kolmogorov-Smirnov (K-S test which has been implemented by many computer statistical software packages. To calculate a two-sided p value (evaluate the cumulative sampling distribution, these packages use various methods including recursion formulae, limiting distributions, and approximations of unknown accuracy developed over thirty years ago. Based on an extensive literature search for the two-sided one sample K-S test, this paper identifies an exact formula for sample sizes up to 31, six recursion formulae, and one matrix formula that can be used to calculate a p value. To ensure accurate calculation by avoiding catastrophic cancelation and eliminating rounding error, each of these formulae is implemented in rational arithmetic. For the six recursion formulae and the matrix formula, computational experience for sample sizes up to 500 shows that computational times are increasing functions of both the sample size and the number of digits in the numerator and denominator integers of the rational number test statistic. The computational times of the seven formulae vary immensely but the Durbin recursion formula is almost always the fastest. Linear search is used to calculate the inverse of the cumulative sampling distribution (find the confidence interval half-width and tables of calculated half-widths are presented for sample sizes up to 500. Using calculated half-widths as input, computational times for the fastest formula, the Durbin recursion formula, are given for sample sizes up to two thousand.

  14. The Score Reliability of Draw-a-Person Intellectual Ability Test (DAP: IQ) for Rural Malawi Students

    Science.gov (United States)

    Khasu, Denis S.; Williams, Thomas O., Jr.

    2016-01-01

    In this brief article, the reliability of scores for the Draw-A-Person Intellectual Ability Test for Children, Adolescents, and Adults (DAP: IQ; Reynolds & Hickman, 2004) was examined through several analyses with a sample of 147 children from rural Malawi, Africa using a Chichewa translation of instructions. Cronbach alpha coefficients for…

  15. REPRODUCIBILITY OF THE MODIFIED STAR EXCURSION BALANCE TEST COMPOSITE AND SPECIFIC REACH DIRECTION SCORES.

    Science.gov (United States)

    van Lieshout, Remko; Reijneveld, Elja A E; van den Berg, Sandra M; Haerkens, Gijs M; Koenders, Niek H; de Leeuw, Arina J; van Oorsouw, Roel G; Paap, Davy; Scheffer, Else; Weterings, Stijn; Stukstette, Mirelle J

    2016-06-01

    The mSEBT is a screening tool used to evaluate dynamic balance. Most research investigating measurement properties focused on intrarater reliability and was done in small samples. To know whether the mSEBT is useful to discriminate dynamic balance between persons and to evaluate changes in dynamic balance, more research into intra- and interrater reliability and smallest detectable change (synonymous with minimal detectable change) is needed. To estimate intra- and interrater reliability and smallest detectable change of the mSEBT in adults at risk for ankle sprain. Cross-sectional, test-retest design. Fifty-five healthy young adults participating in sports at risk for ankle sprain participated (mean ± SD age, 24.0 ± 2.9 years). Each participant performed three test sessions within one hour and was rated by two physical therapists (session 1, rater 1; session 2, rater 2; session 3, rater 1). Participants and raters were blinded for previous measurements. Normalized composite and reach direction scores for the right and left leg were collected. Analysis of variance was used to calculate intraclass correlation coefficient values for intra- and interrater reliability. Smallest detectable change values were calculated based on the standard error of measurement. Intra- and interrater reliability for both legs was good to excellent (intraclass correlation coefficient ranging from 0.87 to 0.94). The intrarater smallest detectable change for the composite score of the right leg was 7.2% and for the left 6.2%. The interrater smallest detectable change for the composite score of the right leg was 6.9% and for the left 5.0%. The mSEBT is a reliable measurement instrument to discriminate dynamic balance between persons. Most smallest detectable change values of the mSEBT appear to be large. More research is needed to investigate if the mSEBT is usable for evaluative purposes. Level 2.

  16. The Weighted Airman Promotion System: Standardizing Test Scores

    Science.gov (United States)

    2008-01-01

    u th o ri ze d Top 3/E6 ratio, inventory 1401206040 100 70 130 5R 2F 2G 3N 2M 2A 4J 4C 4P 4T 4B 1W 2T 3P 1T 4A 2S 5J 1A 1S1C 6F 4N 7S 4R 4E 1N 3A 3V...System: Standardizing Test Scores AFHRL convened a panel to identify the relevant factors to consider, and then sit as a promotion board and rank...Costs If the Air Force decided to standardize test scores, there would be three basic types of costs: implementation costs, marketing costs, and

  17. Two-sample discrimination of Poisson means

    Science.gov (United States)

    Lampton, M.

    1994-01-01

    This paper presents a statistical test for detecting significant differences between two random count accumulations. The null hypothesis is that the two samples share a common random arrival process with a mean count proportional to each sample's exposure. The model represents the partition of N total events into two counts, A and B, as a sequence of N independent Bernoulli trials whose partition fraction, f, is determined by the ratio of the exposures of A and B. The detection of a significant difference is claimed when the background (null) hypothesis is rejected, which occurs when the observed sample falls in a critical region of (A, B) space. The critical region depends on f and the desired significance level, alpha. The model correctly takes into account the fluctuations in both the signals and the background data, including the important case of small numbers of counts in the signal, the background, or both. The significance can be exactly determined from the cumulative binomial distribution, which in turn can be inverted to determine the critical A(B) or B(A) contour. This paper gives efficient implementations of these tests, based on lookup tables. Applications include the detection of clustering of astronomical objects, the detection of faint emission or absorption lines in photon-limited spectroscopy, the detection of faint emitters or absorbers in photon-limited imaging, and dosimetry.

  18. Zertifikat Deutsch als Fremdsprache and the Oral Proficiency Interview: A Comparison of Test Scores and Examinations.

    Science.gov (United States)

    Lalande, John F.; Schweckendiek, Jurgen

    1986-01-01

    Investigates what correlations might exist between an individual's score on the Zertifikat Deutsch als Fremdsprache and on the Oral Proficiency Interview. The tests themselves are briefly described. Results indicate that the two tests appear to correlate well in their evaluation of speaking skills. (SED)

  19. Short communication prevalence of susceptibility to etravirine by genotype and phenotype in samples received for routine HIV type 1 resistance testing in the United States.

    Science.gov (United States)

    Picchio, Gaston; Vingerhoets, Johan; Tambuyzer, Lotke; Coakley, Eoin; Haddad, Mojgan; Witek, James

    2011-12-01

    Abstract The prevalence of susceptibility to etravirine was investigated among clinical samples submitted for routine clinical testing in the United States using two separate weighted genotypic scoring systems. The presence of etravirine mutations and susceptibility to etravirine by phenotype of clinical samples from HIV-1-infected patients, submitted to Monogram Biosciences for routine resistance testing between June 2008 and June 2009, were analyzed. Susceptibility by genotype was determined using the Monogram and Tibotec etravirine-weighted genotypic scoring systems, with scores of ≤3 and ≤2, respectively, indicating full susceptibility. Susceptibility by phenotype was determined using the PhenoSense HIV assay, with lower and higher clinical cut-offs of 2.9 and 10, respectively. The frequency of individual etravirine mutations and the impact of the K103N mutation on susceptibility to etravirine by genotype were also determined. Among the 5482 samples with ≥1 defined nonnucleoside reverse transcriptase inhibitor (NNRTI) mutations associated with resistance, 67% were classed as susceptible to etravirine by genotype by both scoring systems. Susceptibility to etravirine by phenotype was higher (76%). The proportion of first-generation NNRTI-resistant samples with (n=3598) and without (n=1884) K103N with susceptibility to etravirine by genotype was 77% and 49%, respectively. Among samples susceptible to first-generation NNRTIs (n=9458), >99% of samples were susceptible to etravirine by phenotype (FC <2.9); the remaining samples had FC ≥2.9-10. In summary, among samples submitted for routine clinical testing in the United States, a high proportion of samples with first-generation NNRTI resistance was susceptible to etravirine by genotype and phenotype. A higher proportion of NNRTI-resistant samples with K103N than without was susceptible to etravirine.

  20. Generalization of the Lord-Wingersky Algorithm to Computing the Distribution of Summed Test Scores Based on Real-Number Item Scores

    Science.gov (United States)

    Kim, Seonghoon

    2013-01-01

    With known item response theory (IRT) item parameters, Lord and Wingersky provided a recursive algorithm for computing the conditional frequency distribution of number-correct test scores, given proficiency. This article presents a generalized algorithm for computing the conditional distribution of summed test scores involving real-number item…

  1. Online pre-race education improves test scores for volunteers at a marathon.

    Science.gov (United States)

    Maxwell, Shane; Renier, Colleen; Sikka, Robby; Widstrom, Luke; Paulson, William; Christensen, Trent; Olson, David; Nelson, Benjamin

    2017-09-01

    This study examined whether an online course would lead to increased knowledge about the medical issues volunteers encounter during a marathon. Health care professionals who volunteered to provide medical coverage for an annual marathon were eligible for the study. Demographic information about medical volunteers including profession, specialty, education level and number of marathons they had volunteered for was collected. A 15-question test about the most commonly encountered medical issues was created by the authors and administered before and after the volunteers took the online educational course and compared to a pilot study the previous year. Seventy-four subjects completed the pre-test. Those who participated in the pilot study last year (N = 15) had pre-test scores that were an average of 2.4 points higher than those who did not (mean ranks: pilot study = 51.6 vs. non-pilot = 33.9, p = 0.004). Of the 74 subjects who completed the pre-test, 54 also completed the post-test. The overall post-pre mean score difference was 3.8 ± 2.7 (t = 10.5 df = 53 p online education demonstrated a long-term (one-year) increase in test scores. Testing also continued to show short-term improvement in post-course test scores, compared to pre-course test scores. In general, marathon medical volunteers who had no volunteer experience demonstrated greater improvement than those who had prior volunteer experience.

  2. Two phase sampling

    CERN Document Server

    Ahmad, Zahoor; Hanif, Muhammad

    2013-01-01

    The development of estimators of population parameters based on two-phase sampling schemes has seen a dramatic increase in the past decade. Various authors have developed estimators of population using either one or two auxiliary variables. The present volume is a comprehensive collection of estimators available in single and two phase sampling. The book covers estimators which utilize information on single, two and multiple auxiliary variables of both quantitative and qualitative nature. Th...

  3. The Impact of Correction for Guessing Formula on MC and Yes/No Vocabulary Tests' Scores

    Directory of Open Access Journals (Sweden)

    abdollah baradaran

    2009-10-01

    Full Text Available A standard correction for random guessing (cfg formula on multiple-choice and Yes/Noexaminations was examined retrospectively in the scores of the intermediate female EFL learners in an English language school. The correctionwas a weighting formula for points awarded for correct answers,incorrect answers, and unanswered questions so that the expectedvalue of the increase in test score due to guessing was zero. The researcher compared uncorrected and corrected scores on examinationsusing multiple-choice and Yes/No formats. These short-answer formats eliminatedor at least greatly reduced the potential for guessing the correctanswer. The expectation for students to improve their grade by guessingon multiple-choice and Yes/No format examinations is well known. The researcher examined a method for correcting for random guessing (cfg " no knowledge" on multiple- choice and Yes/No vocabulary examinations by comparing application and non-application of correction for guessing (cfg formula on scores on these examinations. It was done to determine whether the test takers really knew the correct answer, or they had resorted to a kind of guessing. This study represented a unique opportunity to compare scores from multiple-choice and Yes/No examinations in a settingin which students were given the same number of questions ineach of the two format types testing their knowledge over thesame subject matter. The results of this study indicated that the significant differences were highlighted between the subjects' scores when cfg formula was applied and when it was not.

  4. Effects of age, gender, education and race on two tests of language ability in community-based older adults.

    Science.gov (United States)

    Snitz, Beth E; Unverzagt, Frederick W; Chang, Chung-Chou H; Bilt, Joni Vander; Gao, Sujuan; Saxton, Judith; Hall, Kathleen S; Ganguli, Mary

    2009-12-01

    Neuropsychological tests, including tests of language ability, are frequently used to differentiate normal from pathological cognitive aging. However, language can be particularly difficult to assess in a standardized manner in cross-cultural studies and in patients from different educational and cultural backgrounds. This study examined the effects of age, gender, education and race on performance of two language tests: the animal fluency task (AFT) and the Indiana University Token Test (IUTT). We report population-based normative data on these tests from two combined ethnically divergent, cognitively normal, representative population samples of older adults. Participants aged > or =65 years from the Monongahela-Youghiogheny Healthy Aging Team (MYHAT) and from the Indianapolis Study of Health and Aging (ISHA) were selected based on (1) a Clinical Dementia Rating (CDR) score of 0; (2) non-missing baseline language test data; and (3) race self-reported as African-American or white. The combined sample (n = 1885) was 28.1% African-American. Multivariate ordinal logistic regression was used to model the effects of demographic characteristics on test scores. On both language tests, better performance was significantly associated with higher education, younger age, and white race. On the IUTT, better performance was also associated with female gender. We found no significant interactions between age and sex, and between race and education. Age and education are more potent variables than are race and gender influencing performance on these language tests. Demographically stratified normative tables for these measures can be used to guide test interpretation and aid clinical diagnosis of impaired cognition.

  5. A Rigorous Test of the Fit of the Circumplex Model to Big Five Personality Data: Theoretical and Methodological Issues and Two Large Sample Empirical Tests.

    Science.gov (United States)

    DeGeest, David Scott; Schmidt, Frank

    2015-01-01

    Our objective was to apply the rigorous test developed by Browne (1992) to determine whether the circumplex model fits Big Five personality data. This test has yet to be applied to personality data. Another objective was to determine whether blended items explained correlations among the Big Five traits. We used two working adult samples, the Eugene-Springfield Community Sample and the Professional Worker Career Experience Survey. Fit to the circumplex was tested via Browne's (1992) procedure. Circumplexes were graphed to identify items with loadings on multiple traits (blended items), and to determine whether removing these items changed five-factor model (FFM) trait intercorrelations. In both samples, the circumplex structure fit the FFM traits well. Each sample had items with dual-factor loadings (8 items in the first sample, 21 in the second). Removing blended items had little effect on construct-level intercorrelations among FFM traits. We conclude that rigorous tests show that the fit of personality data to the circumplex model is good. This finding means the circumplex model is competitive with the factor model in understanding the organization of personality traits. The circumplex structure also provides a theoretically and empirically sound rationale for evaluating intercorrelations among FFM traits. Even after eliminating blended items, FFM personality traits remained correlated.

  6. Semiparametric Copula Models for Biometric Score Level

    NARCIS (Netherlands)

    Caselli, M.

    2016-01-01

    In biometric recognition systems, biometric samples (images of faces, finger- prints, voices, gaits, etc.) of people are compared and classifiers (matchers) indicate the level of similarity between any pair of samples by a score. If two samples of the same person are compared, a genuine score is

  7. Test Scores, Class Rank and College Performance: Lessons for Broadening Access and Promoting Success.

    Science.gov (United States)

    Niu, Sunny X; Tienda, Marta

    2012-04-01

    Using administrative data for five Texas universities that differ in selectivity, this study evaluates the relative influence of two key indicators for college success-high school class rank and standardized tests. Empirical results show that class rank is the superior predictor of college performance and that test score advantages do not insulate lower ranked students from academic underperformance. Using the UT-Austin campus as a test case, we conduct a simulation to evaluate the consequences of capping students admitted automatically using both achievement metrics. We find that using class rank to cap the number of students eligible for automatic admission would have roughly uniform impacts across high schools, but imposing a minimum test score threshold on all students would have highly unequal consequences by greatly reduce the admission eligibility of the highest performing students who attend poor high schools while not jeopardizing admissibility of students who attend affluent high schools. We discuss the implications of the Texas admissions experiment for higher education in Europe.

  8. Using Raters from India to Score a Large-Scale Speaking Test

    Science.gov (United States)

    Xi, Xiaoming; Mollaun, Pam

    2011-01-01

    We investigated the scoring of the Speaking section of the Test of English as a Foreign Language[TM] Internet-based (TOEFL iBT[R]) test by speakers of English and one or more Indian languages. We explored the extent to which raters from India, after being trained and certified, were able to score the TOEFL examinees with mixed first languages…

  9. Use of Verbal Descriptors, Thermal Scores and Electrical Pulp Testing Scores as Predictors of Tooth Pain Before and After Application of Benzocaine Gels into Cavities of Teeth with Pulpitis

    Science.gov (United States)

    Gangarosa, Louis P.; Ciarlone, Alfred E.; Neaverth, Elmer J.; Johnston, Carey A.; Snowden, J. Douglas; Thompson, William O.

    1989-01-01

    A double-blind pilot study was conducted on 27 consenting human volunteers who had irreversible pulpitis associated with persistent toothache pain from open carious lesions. Formulations tested contained either 0, 10%, or 20% benzocaine and were identified only by a numbered code. Before the experiment started, a small amount of a known 5% benzocaine gel was placed for 1 minute on the tongue of each patient to assure a sensation of numbness within the oral cavity. Then the test tooth was washed with a gentle stream of warm water and dried with gauze. A randomly selected test medication was placed into the open cavity and around the gingival margins for 5 minutes. Pre- and posttreatment tests were conducted at the following timed intervals: 0, 5, 15, 30, 45, 60, 75 and 90 minutes. The tests included degree of pain (rated: 0 = none, 1 = mild, 2 = moderate, 3 = severe); electrical pulp testing (EPT) by a modified, voltage-ramping instrument; and ice water testing (0.5 mL directed quickly onto sound enamel of the tooth and rated: 0 to 4, with 4 being intolerable). After testing, or when pain returned to baseline, endodontic procedures were performed. There was a significant increase (p pulpitis and control teeth, 3) there were no correlations between direction of EPT scores and pain relief, 4) cold water testing was a good predictor of whether or not a tooth had pulpitis, and 5) changes in cold water testing scores after treatment could not be correlated to relief of pain according to verbal descriptors. The effectiveness of benzocaine in relieving toothache pain verifies previous studies; however, a difference between 10% and 20% benzocaine could not be demonstrated probably because of two factors: 1) the present experiment had a small sample size, and 2) there was no direct measurement of duration of local anesthesia. PMID:2490060

  10. A GMM-Based Test for Normal Disturbances of the Heckman Sample Selection Model

    Directory of Open Access Journals (Sweden)

    Michael Pfaffermayr

    2014-10-01

    Full Text Available The Heckman sample selection model relies on the assumption of normal and homoskedastic disturbances. However, before considering more general, alternative semiparametric models that do not need the normality assumption, it seems useful to test this assumption. Following Meijer and Wansbeek (2007, the present contribution derives a GMM-based pseudo-score LM test on whether the third and fourth moments of the disturbances of the outcome equation of the Heckman model conform to those implied by the truncated normal distribution. The test is easy to calculate and in Monte Carlo simulations it shows good performance for sample sizes of 1000 or larger.

  11. Intra- and inter-rater reliability of the Knee Society Knee Score when used by two physiotherapists in patients post total knee arthroplasty

    Directory of Open Access Journals (Sweden)

    S. Gopal

    2010-01-01

    Full Text Available Background and Purpose: It has yet to be shown whether routine physiotherapy plays a role in the rehabilitation of patients post totalknee arthroplasty (Rajan et al 2004. Physiotherapists should be using validoutcome measures to provide evidence of the benefit of their intervention. The aim of this study was to establish the intra and inter-rater reliability of the Knee Society Knee Score, a scoring system developed by Insall et al(1989. The Knee Society Knee Score can be used to assess the integrity of theknee joint of patients undergoing total knee arthroplasty. Since the scoreinvolves clinical testing, the intra-rater reliability of the clinician should be established prior to using the scores as datain clinical research. W here multiple clinicians are involved, inter-rater reliability should also be established.Design: This was a correlation study.Subjects: A  sample of thirty patients post total knee arthroplasty attending the arthroplasty clinic at Johannesburg Hospital between six weeks and twelve months postoperatively.M ethod: Recruited patients were evaluated twice with a time interval of one hour between each assessment. Statistical A nalysis: The intra- and inter-rater reliability were estimated using Intraclass Correlation Coefficient (ICC. R esults: The intra-rater reliability showed excellent reliability (h= 0.95 for Examiner A  and good reliability (h= 0.71for Examiner B. The inter-rater reliability showed moderate reliability (h= 0.67 during test one and h= 0.66 during test two.Conclusion: The KSKS has good intra-rater reliability when tested within a period of one hour. The KSKS demonstrated moderate agreement for inter rater reliability.

  12. Further examination of embedded performance validity indicators for the Conners' Continuous Performance Test and Brief Test of Attention in a large outpatient clinical sample.

    Science.gov (United States)

    Sharland, Michael J; Waring, Stephen C; Johnson, Brian P; Taran, Allise M; Rusin, Travis A; Pattock, Andrew M; Palcher, Jeanette A

    2018-01-01

    Assessing test performance validity is a standard clinical practice and although studies have examined the utility of cognitive/memory measures, few have examined attention measures as indicators of performance validity beyond the Reliable Digit Span. The current study further investigates the classification probability of embedded Performance Validity Tests (PVTs) within the Brief Test of Attention (BTA) and the Conners' Continuous Performance Test (CPT-II), in a large clinical sample. This was a retrospective study of 615 patients consecutively referred for comprehensive outpatient neuropsychological evaluation. Non-credible performance was defined two ways: failure on one or more PVTs and failure on two or more PVTs. Classification probability of the BTA and CPT-II into non-credible groups was assessed. Sensitivity, specificity, positive predictive value, and negative predictive value were derived to identify clinically relevant cut-off scores. When using failure on two or more PVTs as the indicator for non-credible responding compared to failure on one or more PVTs, highest classification probability, or area under the curve (AUC), was achieved by the BTA (AUC = .87 vs. .79). CPT-II Omission, Commission, and Total Errors exhibited higher classification probability as well. Overall, these findings corroborate previous findings, extending them to a large clinical sample. BTA and CPT-II are useful embedded performance validity indicators within a clinical battery but should not be used in isolation without other performance validity indicators.

  13. Measurement of ability emotional intelligence: results for two new tests.

    Science.gov (United States)

    Austin, Elizabeth J

    2010-08-01

    Emotional intelligence (EI) has attracted considerable interest amongst both individual differences researchers and those in other areas of psychology who are interested in how EI relates to criteria such as well-being and career success. Both trait (self-report) and ability EI measures have been developed; the focus of this paper is on ability EI. The associations of two new ability EI tests with psychometric intelligence, emotion perception, and the Mayer-Salovey-Caruso EI test (MSCEIT) were examined. The new EI tests were the Situational Test of Emotion Management (STEM) and the Situational Test of Emotional Understanding (STEU). Only the STEU and the MSCEIT Understanding Emotions branch were significantly correlated with psychometric intelligence, suggesting that only understanding emotions can be regarded as a candidate new intelligence component. These understanding emotions tests were also positively correlated with emotion perception tests, and STEM and STEU scores were positively correlated with MSCEIT total score and most branch scores. Neither the STEM nor the STEU were significantly correlated with trait EI tests, confirming the distinctness of trait and ability EI. Taking the present results as a starting-point, approaches to the development of new ability EI tests and models of EI are suggested.

  14. Validating Score Interpretations and Uses: Messick Lecture, Language Testing Research Colloquium, Cambridge, April 2010

    Science.gov (United States)

    Kane, Michael

    2012-01-01

    The argument-based approach to validation involves two steps; specification of the proposed interpretations and uses of the test scores as an interpretive argument, and the evaluation of the plausibility of the proposed interpretive argument. More ambitious interpretations and uses tend to involve an extended network of inferences and assumptions…

  15. Testing the applicability of the SASS5 scoring procedure for ...

    African Journals Online (AJOL)

    A study was undertaken between 29th January and 17th February 2004 to test the applicability of the South African Scoring System Version 5 (SASS5) scoring and calculation procedure in nutrient-enriched palustrine wetlands in the midlands of KwaZulu-Natal, South Africa. Four reference wetlands and three dairy-effluent ...

  16. Evaluating the Predictive Validity of Graduate Management Admission Test Scores

    Science.gov (United States)

    Sireci, Stephen G.; Talento-Miller, Eileen

    2006-01-01

    Admissions data and first-year grade point average (GPA) data from 11 graduate management schools were analyzed to evaluate the predictive validity of Graduate Management Admission Test[R] (GMAT[R]) scores and the extent to which predictive validity held across sex and race/ethnicity. The results indicated GMAT verbal and quantitative scores had…

  17. A One-Sample Test for Normality with Kernel Methods

    OpenAIRE

    Kellner , Jérémie; Celisse , Alain

    2015-01-01

    We propose a new one-sample test for normality in a Reproducing Kernel Hilbert Space (RKHS). Namely, we test the null-hypothesis of belonging to a given family of Gaussian distributions. Hence our procedure may be applied either to test data for normality or to test parameters (mean and covariance) if data are assumed Gaussian. Our test is based on the same principle as the MMD (Maximum Mean Discrepancy) which is usually used for two-sample tests such as homogeneity or independence testing. O...

  18. Dimensional Structure and Measurement Invariance of the Schizotypal Personality Questionnaire - Brief Revised (SPQ-BR) Scores Across American and Spanish Samples.

    Science.gov (United States)

    Fonseca-Pedrero, Eduardo; Cohen, Alex; Ortuño-Sierra, Javier; de Álbeniz, Alicia Pérez; Muñiz, José

    2017-08-01

    The main goal of the present study was to test the measurement equivalence of the Schizotypal Personality Questionnaire - Brief Revised (SPQ-BR) scores in a large sample of Spanish and American non-clinical young adults. The sample was made up of 5,625 young adults (M = 19.65 years; SD = 2.53; 38.5% males). Study of the internal structure, using confirmatory factor analysis (CFA), revealed that SPQ-BR items were grouped in a theoretical internal structure of nine first-order factors. Moreover, three or four second-order factor and bifactor models showed adequate goodness-of-fit indices. Multigroup CFA showed that the nine lower-order factor models of the SPQ-BR had configural and weak measurement invariance and partial strong measurement invariance across country. The reliability of the SPQ-BR scores, estimated with omega, ranged from 0.67 to 0.91. Using the item response theory framework, the SPQ-BR provides more accurate information at the medium and high end of the latent trait. Statistically significant differences were found in the raw scores of the SPQ-BR subscales and dimensions across samples. The American group scored higher than the Spanish group in all SPQ-BR domains except Ideas of Reference and Suspiciousness. The finding of comparable factor structure in cross-cultural samples would lend further support to the continuum model of psychosis spectrum disorders. In addition, these results provide new information about the factor structure of schizotypal traits and support the validity and utility of this measure in cross-cultural research.

  19. Effects of Test Media on Different EFL Test-Takers in Writing Scores and in the Cognitive Writing Process

    Science.gov (United States)

    Zou, Xiao-Ling; Chen, Yan-Min

    2016-01-01

    The effects of computer and paper test media on EFL test-takers with different computer familiarity in writing scores and in the cognitive writing process have been comprehensively explored from the learners' aspect as well as on the basis of related theories and practice. The results indicate significant differences in test scores among the…

  20. Does breastfeeding contribute to the racial gap in reading and math test scores?

    Science.gov (United States)

    Peters, Kristen E; Huang, Jin; Vaughn, Michael G; Witko, Christopher

    2013-10-01

    The aim of this study was to examine the impact of divergent breastfeeding practices between Caucasian and African American mothers on the lingering achievement test gap between Caucasian and African American children. The Child Development Supplement of the Panel Study of Income Dynamics, beginning in 1997, followed a cohort of 3563 children aged 0-12 years. Reading and math test scores from 2002 for 1928 children were linked with breastfeeding history. Regression analysis was used to examine associations between ever having been breastfed and duration of breastfeeding and test scores, controlling for characteristics of child, mother, and household. African American students scored significantly lower than Caucasian children by 10.6 and 10.9 points on reading and math tests, respectively. After accounting for the impact of having been breastfed during infancy, the racial test gap decreased by 17% for reading scores and 9% for math scores. Study findings indicate that breastfeeding explains 17% and 9% of the observed gaps in reading and math scores, respectively, between African Americans and Caucasians, an effect larger than most recent educational policy interventions. Renewed efforts around policies and clinical practices that promote and remove barriers for African American mothers to breastfeed should be implemented. Copyright © 2013 Elsevier Inc. All rights reserved.

  1. Accountancy, teaching methods, sex, and American College Test scores.

    Science.gov (United States)

    Heritage, J; Harper, B S; Harper, J P

    1990-10-01

    This study examines the significance of sex, methodology, academic preparation, and age as related to development of judgmental and problem-solving skills. Sex, American College Test (ACT) Mathematics scores, Composite ACT scores, grades in course work, grade point average (GPA), and age were used in studying the effects of teaching method on 96 students' ability to analyze data in financial statements. Results reflect positively on accounting students compared to the general college population and the women students in particular.

  2. Reduce, Reuse, Recycle: The Longitudinal Value of Local Cut Scores Using State Test Data

    Science.gov (United States)

    Nelson, Peter M.; Van Norman, Ethan R.; VanDerHeyden, Amanda

    2017-01-01

    We used existing reading (n = 1,498) and math (n = 2,260) data to evaluate state test scores for screening middle school students. In Phase 1, state test data were used to create a research-derived cut score that was optimal for predicting state test performance the following year. In Phase 2, those cut scores were applied with future cohorts.…

  3. Refining Ovarian Cancer Test accuracy Scores (ROCkeTS): protocol for a prospective longitudinal test accuracy study to validate new risk scores in women with symptoms of suspected ovarian cancer

    Science.gov (United States)

    Sundar, Sudha; Rick, Caroline; Dowling, Francis; Au, Pui; Rai, Nirmala; Champaneria, Rita; Stobart, Hilary; Neal, Richard; Davenport, Clare; Mallett, Susan; Sutton, Andrew; Kehoe, Sean; Timmerman, Dirk; Bourne, Tom; Van Calster, Ben; Gentry-Maharaj, Aleksandra; Deeks, Jon

    2016-01-01

    Introduction Ovarian cancer (OC) is associated with non-specific symptoms such as bloating, making accurate diagnosis challenging: only 1 in 3 women with OC presents through primary care referral. National Institute for Health and Care Excellence guidelines recommends sequential testing with CA125 and routine ultrasound in primary care. However, these diagnostic tests have limited sensitivity or specificity. Improving accurate triage in women with vague symptoms is likely to improve mortality by streamlining referral and care pathways. The Refining Ovarian Cancer Test Accuracy Scores (ROCkeTS; HTA 13/13/01) project will derive and validate new tests/risk prediction models that estimate the probability of having OC in women with symptoms. This protocol refers to the prospective study only (phase III). Methods and analysis ROCkeTS comprises four parallel phases. The full ROCkeTS protocol can be found at http://www.birmingham.ac.uk/ROCKETS. Phase III is a prospective test accuracy study. The study will recruit 2450 patients from 15 UK sites. Recruited patients complete symptom and anxiety questionnaires, donate a serum sample and undergo ultrasound scored as per International Ovarian Tumour Analysis (IOTA) criteria. Recruitment is at rapid access clinics, emergency departments and elective clinics. Models to be evaluated include those based on ultrasound derived by the IOTA group and novel models derived from analysis of existing data sets. Estimates of sensitivity, specificity, c-statistic (area under receiver operating curve), positive predictive value and negative predictive value of diagnostic tests are evaluated and a calibration plot for models will be presented. ROCkeTS has received ethical approval from the NHS West Midlands REC (14/WM/1241) and is registered on the controlled trials website (ISRCTN17160843) and the National Institute of Health Research Cancer and Reproductive Health portfolios. PMID:27507231

  4. Evaluation of Approaches to Analyzing Continuous Correlated Eye Data When Sample Size Is Small.

    Science.gov (United States)

    Huang, Jing; Huang, Jiayan; Chen, Yong; Ying, Gui-Shuang

    2018-02-01

    To evaluate the performance of commonly used statistical methods for analyzing continuous correlated eye data when sample size is small. We simulated correlated continuous data from two designs: (1) two eyes of a subject in two comparison groups; (2) two eyes of a subject in the same comparison group, under various sample size (5-50), inter-eye correlation (0-0.75) and effect size (0-0.8). Simulated data were analyzed using paired t-test, two sample t-test, Wald test and score test using the generalized estimating equations (GEE) and F-test using linear mixed effects model (LMM). We compared type I error rates and statistical powers, and demonstrated analysis approaches through analyzing two real datasets. In design 1, paired t-test and LMM perform better than GEE, with nominal type 1 error rate and higher statistical power. In design 2, no test performs uniformly well: two sample t-test (average of two eyes or a random eye) achieves better control of type I error but yields lower statistical power. In both designs, the GEE Wald test inflates type I error rate and GEE score test has lower power. When sample size is small, some commonly used statistical methods do not perform well. Paired t-test and LMM perform best when two eyes of a subject are in two different comparison groups, and t-test using the average of two eyes performs best when the two eyes are in the same comparison group. When selecting the appropriate analysis approach the study design should be considered.

  5. Validity of GRE General Test scores and TOEFL scores for graduate admission to a technical university in Western Europe

    Science.gov (United States)

    Zimmermann, Judith; von Davier, Alina A.; Buhmann, Joachim M.; Heinimann, Hans R.

    2018-01-01

    Graduate admission has become a critical process in tertiary education, whereby selecting valid admissions instruments is key. This study assessed the validity of Graduate Record Examination (GRE) General Test scores for admission to Master's programmes at a technical university in Europe. We investigated the indicative value of GRE scores for the Master's programme grade point average (GGPA) with and without the addition of the undergraduate GPA (UGPA) and the TOEFL score, and of GRE scores for study completion and Master's thesis performance. GRE scores explained 20% of the variation in the GGPA, while additional 7% were explained by the TOEFL score and 3% by the UGPA. Contrary to common belief, the GRE quantitative reasoning score showed only little explanatory power. GRE scores were also weakly related to study progress but not to thesis performance. Nevertheless, GRE and TOEFL scores were found to be sensible admissions instruments. Rigorous methodology was used to obtain highly reliable results.

  6. The Performance of the Upper Limb scores correlate with pulmonary function test measures and Egen Klassifikation scores in Duchenne muscular dystrophy.

    Science.gov (United States)

    Lee, Ha Neul; Sawnani, Hemant; Horn, Paul S; Rybalsky, Irina; Relucio, Lani; Wong, Brenda L

    2016-01-01

    The Performance of the Upper Limb scale was developed as an outcome measure specifically for ambulant and non-ambulant patients with Duchenne muscular dystrophy and is implemented in clinical trials needing longitudinal data. The aim of this study is to determine whether this novel tool correlates with functional ability using pulmonary function test, cardiac function test and Egen Klassifikation scale scores as clinical measures. In this cross-sectional study, 43 non-ambulatory Duchenne males from ages 10 to 30 years and on long-term glucocorticoid treatment were enrolled. Cardiac and pulmonary function test results were analyzed to assess cardiopulmonary function, and Egen Klassifikation scores were analyzed to assess functional ability. The Performance of the Upper Limb scores correlated with pulmonary function measures and had inverse correlation with Egen Klassifikation scores. There was no correlation with left ventricular ejection fraction and left ventricular dysfunction. Body mass index and decreased joint range of motion affected total Performance of the Upper Limb scores and should be considered in clinical trial designs. Copyright © 2016 Elsevier B.V. All rights reserved.

  7. Relative Merits of Four Methods for Scoring Cloze Tests.

    Science.gov (United States)

    Brown, James Dean

    1980-01-01

    Describes study comparing merits of exact answer, acceptable answer, clozentropy and multiple choice methods for scoring tests. Results show differences among reliability, mean item facility, discrimination and usability, but not validity. (BK)

  8. A comparison of likelihood ratio tests and Rao's score test for three separable covariance matrix structures.

    Science.gov (United States)

    Filipiak, Katarzyna; Klein, Daniel; Roy, Anuradha

    2017-01-01

    The problem of testing the separability of a covariance matrix against an unstructured variance-covariance matrix is studied in the context of multivariate repeated measures data using Rao's score test (RST). The RST statistic is developed with the first component of the separable structure as a first-order autoregressive (AR(1)) correlation matrix or an unstructured (UN) covariance matrix under the assumption of multivariate normality. It is shown that the distribution of the RST statistic under the null hypothesis of any separability does not depend on the true values of the mean or the unstructured components of the separable structure. A significant advantage of the RST is that it can be performed for small samples, even smaller than the dimension of the data, where the likelihood ratio test (LRT) cannot be used, and it outperforms the standard LRT in a number of contexts. Monte Carlo simulations are then used to study the comparative behavior of the null distribution of the RST statistic, as well as that of the LRT statistic, in terms of sample size considerations, and for the estimation of the empirical percentiles. Our findings are compared with existing results where the first component of the separable structure is a compound symmetry (CS) correlation matrix. It is also shown by simulations that the empirical null distribution of the RST statistic converges faster than the empirical null distribution of the LRT statistic to the limiting χ 2 distribution. The tests are implemented on a real dataset from medical studies. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  9. The Mediating Effect of Listening Metacognitive Awareness between Test-Taking Motivation and Listening Test Score: An Expectancy-Value Theory Approach.

    Science.gov (United States)

    Xu, Jian

    2017-01-01

    The present study investigated test-taking motivation in L2 listening testing context by applying Expectancy-Value Theory as the framework. Specifically, this study was intended to examine the complex relationships among expectancy, importance, interest, listening anxiety, listening metacognitive awareness, and listening test score using data from a large-scale and high-stakes language test among Chinese first-year undergraduates. Structural equation modeling was used to examine the mediating effect of listening metacognitive awareness on the relationship between expectancy, importance, interest, listening anxiety, and listening test score. According to the results, test takers' listening scores can be predicted by expectancy, interest, and listening anxiety significantly. The relationship between expectancy, interest, listening anxiety, and listening test score was mediated by listening metacognitive awareness. The findings have implications for test takers to improve their test taking motivation and listening metacognitive awareness, as well as for L2 teachers to intervene in L2 listening classrooms.

  10. The Mediating Effect of Listening Metacognitive Awareness between Test-Taking Motivation and Listening Test Score: An Expectancy-Value Theory Approach

    Directory of Open Access Journals (Sweden)

    Jian Xu

    2017-12-01

    Full Text Available The present study investigated test-taking motivation in L2 listening testing context by applying Expectancy-Value Theory as the framework. Specifically, this study was intended to examine the complex relationships among expectancy, importance, interest, listening anxiety, listening metacognitive awareness, and listening test score using data from a large-scale and high-stakes language test among Chinese first-year undergraduates. Structural equation modeling was used to examine the mediating effect of listening metacognitive awareness on the relationship between expectancy, importance, interest, listening anxiety, and listening test score. According to the results, test takers’ listening scores can be predicted by expectancy, interest, and listening anxiety significantly. The relationship between expectancy, interest, listening anxiety, and listening test score was mediated by listening metacognitive awareness. The findings have implications for test takers to improve their test taking motivation and listening metacognitive awareness, as well as for L2 teachers to intervene in L2 listening classrooms.

  11. The Dental Hygiene Aptitude Tests and the American College Testing Program Tests as Predictors of Scores on the National Board Dental Hygiene Examination.

    Science.gov (United States)

    Longenbecker, Sueann; Wood, Peter H.

    1984-01-01

    Scores from the National Board Dental Hygiene Examination (NBDHE) served as the criterion variable in a comparison of the predictive validity of the Dental Hygiene Aptitude Tests (DHAT) and the ACT Assessment tests. The DHAT-Science and Verbal tests combined to produce the highest multiple correlation with NBDHE scores. (Author/DWH)

  12. The Alzheimer’s Prevention Initiative composite cognitive test score: Sample size estimates for the evaluation of preclinical Alzheimer’s disease treatments in presenilin 1 E280A mutation carriers

    Science.gov (United States)

    Ayutyanont, Napatkamon; Langbaum, Jessica B.; Hendrix, Suzanne B.; Chen, Kewei; Fleisher, Adam S.; Friesenhahn, Michel; Ward, Michael; Aguirre, Camilo; Acosta-Baena, Natalia; Madrigal, Lucìa; Muñoz, Claudia; Tirado, Victoria; Moreno, Sonia; Tariot, Pierre N.; Lopera, Francisco; Reiman, Eric M.

    2014-01-01

    Objective There is a need to identify a cognitive composite that is sensitive to tracking preclinical AD decline to be used as a primary endpoint in treatment trials. Method We capitalized on longitudinal data, collected from 1995 to 2010, from cognitively unimpaired presenilin 1 (PSEN1) E280A mutation carriers from the world’s largest known early-onset autosomal dominant AD (ADAD) kindred to identify a composite cognitive test with the greatest statistical power to track preclinical AD decline and estimate the number of carriers age 30 and older needed to detect a treatment effect in the Alzheimer’s Prevention Initiative’s (API) preclinical AD treatment trial. The mean-to-standard-deviation ratios (MSDRs) of change over time were calculated in a search for the optimal combination of one to seven cognitive tests/sub-tests drawn from the neuropsychological test battery in cognitively unimpaired mutation carriers during a two and five year follow-up period, using data from non-carriers during the same time period to correct for aging and practice effects. Combinations that performed well were then evaluated for robustness across follow-up years, occurrence of selected items within top performing combinations and representation of relevant cognitive domains. Results This optimal test combination included CERAD Word List Recall, CERAD Boston Naming Test (high frequency items), MMSE Orientation to Time, CERAD Constructional Praxis and Ravens Progressive Matrices (Set A) with an MSDR of 1.62. This composite is more sensitive than using either the CERAD Word List Recall (MSDR=0.38) or the entire CERAD-Col battery (MSDR=0.76). A sample size of 75 cognitively normal PSEN1-E280A mutation carriers age 30 and older per treatment arm allows for a detectable treatment effect of 29% in a 60-month trial (80% power, p=0.05). Conclusions We have identified a composite cognitive test score representing multiple cognitive domains that has improved power compared to the most

  13. Manual for Scoring the Test of Directed Imagination.

    Science.gov (United States)

    Veldman, Donald J.; And Others

    A scoring manual for the Directed Imagination Test, a projective technique wherein the subject is instructed to write four fictional stories (four minutes are allowed for each) about teachers and their experiences, is presented. The manual provides detailed instructions for rating each story by fifteen dimensions relevant to teacher education…

  14. Polygenic Risk Score for Alzheimer's Disease: Implications for Memory Performance and Hippocampal Volumes in Early Life.

    Science.gov (United States)

    Axelrud, Luiza K; Santoro, Marcos L; Pine, Daniel S; Talarico, Fernanda; Gadelha, Ary; Manfro, Gisele G; Pan, Pedro M; Jackowski, Andrea; Picon, Felipe; Brietzke, Elisa; Grassi-Oliveira, Rodrigo; Bressan, Rodrigo A; Miguel, Eurípedes C; Rohde, Luis A; Hakonarson, Hakon; Pausova, Zdenka; Belangero, Sintia; Paus, Tomas; Salum, Giovanni A

    2018-06-01

    Alzheimer's disease is a heritable neurodegenerative disorder in which early-life precursors may manifest in cognition and brain structure. The authors evaluate this possibility by examining, in youths, associations among polygenic risk score for Alzheimer's disease, cognitive abilities, and hippocampal volume. Participants were children 6-14 years of age in two Brazilian cities, constituting the discovery (N=364) and replication samples (N=352). As an additional replication, data from a Canadian sample (N=1,029), with distinct tasks, MRI protocol, and genetic risk, were included. Cognitive tests quantified memory and executive function. Reading and writing abilities were assessed by standardized tests. Hippocampal volumes were derived from the Multiple Automatically Generated Templates (MAGeT) multi-atlas segmentation brain algorithm. Genetic risk for Alzheimer's disease was quantified using summary statistics from the International Genomics of Alzheimer's Project. Analyses showed that for the Brazilian discovery sample, each one-unit increase in z-score for Alzheimer's polygenic risk score significantly predicted a 0.185 decrement in z-score for immediate recall and a 0.282 decrement for delayed recall. Findings were similar for the Brazilian replication sample (immediate and delayed recall, β=-0.259 and β=-0.232, both significant). Quantile regressions showed lower hippocampal volumes bilaterally for individuals with high polygenic risk scores. Associations fell short of significance for the Canadian sample. Genetic risk for Alzheimer's disease may affect early-life cognition and hippocampal volumes, as shown in two independent samples. These data support previous evidence that some forms of late-life dementia may represent developmental conditions with roots in childhood. This result may vary depending on a sample's genetic risk and may be specific to some types of memory tasks.

  15. Bovine milk sampling efficiency for pregnancy-associated glycoproteins (PAG) detection test

    Energy Technology Data Exchange (ETDEWEB)

    Silva, H. K. da; Cassoli, L.D.; Pantoja, J.F.C.; Cerqueira, P.H.R.; Coitinho, T.B.; Machado, P.F.

    2016-07-01

    Two experiments were conducted to verify whether the time of day at which a milk sample is collected and the possible carryover in the milking system may affect pregnancy-associated glycoproteins (PAG) levels and, consequently, the pregnancy test results in dairy cows. In experiment one, we evaluated the effect of time of day at which the milk sample is collected from 51 cows. In experiment two, which evaluated the possible occurrence of carryover in the milk meter milking system, milk samples from 94 cows belonging to two different farms were used. The samples were subjected to pregnancy test using ELISA methodology to measure PAG concentrations and to classify the samples as positive (pregnant), negative (nonpregnant), or suspicious (recheck). We found that the time of milking did not affect the PAG levels. As to the occurrence of carryover in the milk meter, the PAG levels of the samples collected from Farm-2 were heavily influenced by a carryover effect compared with the samples from Farm-1. Thus, milk samples submitted to a pregnancy test can be collected during the morning or the evening milking. When the sample is collected from the milk meters, periodic equipment maintenance should be noted, including whether the milk meter is totally drained between different animals’ milking and equipment cleaning between milking is performed correctly to minimize the occurrence of carryover, thereby avoiding the effect on PAG levels and, consequently, the pregnancy test results. Therefore, a single milk sample can be used for both milk quality tests and pregnancy test.

  16. Bovine milk sampling efficiency for pregnancy-associated glycoproteins (PAG) detection test

    International Nuclear Information System (INIS)

    Silva, H. K. da; Cassoli, L.D.; Pantoja, J.F.C.; Cerqueira, P.H.R.; Coitinho, T.B.; Machado, P.F.

    2016-01-01

    Two experiments were conducted to verify whether the time of day at which a milk sample is collected and the possible carryover in the milking system may affect pregnancy-associated glycoproteins (PAG) levels and, consequently, the pregnancy test results in dairy cows. In experiment one, we evaluated the effect of time of day at which the milk sample is collected from 51 cows. In experiment two, which evaluated the possible occurrence of carryover in the milk meter milking system, milk samples from 94 cows belonging to two different farms were used. The samples were subjected to pregnancy test using ELISA methodology to measure PAG concentrations and to classify the samples as positive (pregnant), negative (nonpregnant), or suspicious (recheck). We found that the time of milking did not affect the PAG levels. As to the occurrence of carryover in the milk meter, the PAG levels of the samples collected from Farm-2 were heavily influenced by a carryover effect compared with the samples from Farm-1. Thus, milk samples submitted to a pregnancy test can be collected during the morning or the evening milking. When the sample is collected from the milk meters, periodic equipment maintenance should be noted, including whether the milk meter is totally drained between different animals’ milking and equipment cleaning between milking is performed correctly to minimize the occurrence of carryover, thereby avoiding the effect on PAG levels and, consequently, the pregnancy test results. Therefore, a single milk sample can be used for both milk quality tests and pregnancy test.

  17. Testing measurement invariance of the schizotypal personality questionnaire-brief scores across Spanish and Swiss adolescents.

    Directory of Open Access Journals (Sweden)

    Javier Ortuño-Sierra

    Full Text Available BACKGROUND: Schizotypy is a complex construct intimately related to psychosis. Empirical evidence indicates that participants with high scores on schizotypal self-report are at a heightened risk for the later development of psychotic disorders. Schizotypal experiences represent the behavioural expression of liability for psychotic disorders. Previous factorial studies have shown that schizotypy is a multidimensional construct similar to that found in patients with schizophrenia. Specifically, using the Schizotypal Personality Questionnaire-Brief (SPQ-B, the three-dimensional model has been widely replicated. However, there has been no in-depth investigation of whether the dimensional structure underlying the SPQ-B scores is invariant across countries. METHODS: The main goal of this study was to examine the measurement invariance of the SPQ-B scores across Spanish and Swiss adolescents. The final sample was made up of 261 Spanish participants (51.7% men; M = 16.04 years and 241 Swiss participants (52.3% men; M = 15.94 years. RESULTS: The results indicated that Raine et al.'s three-factor model presented adequate goodness-of-fit indices. Moreover, the results supported the measurement invariance (configural and partial strong invariance of the SPQ-B scores across the two samples. Spanish participants scored higher on Interpersonal dimension than Swiss when latent means were compared. DISCUSSION: The study of measurement equivalence across countries provides preliminary evidence for the Raine et al.'s three-factor model and of the cross-cultural validity of the SPQ-B scores in adolescent population. Future studies should continue to examine the measurement invariance of the schizotypy and psychosis-risk syndromes across cultures.

  18. AP Trends: Tests Soar, Scores Slip--Gaps between Groups Spur Equity Concerns

    Science.gov (United States)

    Cech, Scott J.

    2008-01-01

    More students are taking Advanced Placement tests, but the proportion of tests receiving what is deemed a passing score has dipped, and the mean score is down for the fourth year in a row. Data released here this week by the New York City-based nonprofit organization that owns the AP brand shows that a greater-than-ever proportion of students…

  19. Effect of Mindfulness Meditation on Perceived Stress Scores and Autonomic Function Tests of Pregnant Indian Women.

    Science.gov (United States)

    Muthukrishnan, Shobitha; Jain, Reena; Kohli, Sangeeta; Batra, Swaraj

    2016-04-01

    Various pregnancy complications like hypertension, preeclampsia have been strongly correlated with maternal stress. One of the connecting links between pregnancy complications and maternal stress is mind-body intervention which can be part of Complementary and Alternative Medicine (CAM). Biologic measures of stress during pregnancy may get reduced by such interventions. To evaluate the effect of Mindfulness meditation on perceived stress scores and autonomic function tests of pregnant Indian women. Pregnant Indian women of 12 weeks gestation were randomised to two treatment groups: Test group with Mindfulness meditation and control group with their usual obstetric care. The effect of Mindfulness meditation on perceived stress scores and cardiac sympathetic functions and parasympathetic functions (Heart rate variation with respiration, lying to standing ratio, standing to lying ratio and respiratory rate) were evaluated on pregnant Indian women. There was a significant decrease in perceived stress scores, a significant decrease of blood pressure response to cold pressor test and a significant increase in heart rate variability in the test group (pwomen. The results of this study suggest that mindfulness meditation improves parasympathetic functions in pregnant women and is a powerful modulator of the sympathetic nervous system during pregnancy.

  20. Failure-censored accelerated life test sampling plans for Weibull distribution under expected test time constraint

    International Nuclear Information System (INIS)

    Bai, D.S.; Chun, Y.R.; Kim, J.G.

    1995-01-01

    This paper considers the design of life-test sampling plans based on failure-censored accelerated life tests. The lifetime distribution of products is assumed to be Weibull with a scale parameter that is a log linear function of a (possibly transformed) stress. Two levels of stress higher than the use condition stress, high and low, are used. Sampling plans with equal expected test times at high and low test stresses which satisfy the producer's and consumer's risk requirements and minimize the asymptotic variance of the test statistic used to decide lot acceptability are obtained. The properties of the proposed life-test sampling plans are investigated

  1. Relationships between the handball-specific complex test, non-specific field tests and the match performance score in elite professional handball players.

    Science.gov (United States)

    Hermassi, Souhail; Chelly, Mohamed-Souhaiel; Wollny, Rainer; Hoffmeyer, Birgit; Fieseler, Georg; Schulze, Stephan; Irlenbusch, Lars; Delank, Karl-Stefan; Shephard, Roy J; Bartels, Thomas; Schwesig, René

    2018-06-01

    This study assessed the validity of the handball-specific complex test (HBCT) and two non-specific field tests in professional elite handball athletes, using the match performance score (MPS) as the gold standard of performance. Thirteen elite male handball players (age: 27.4±4.8 years; premier German league) performed the HBCT, the Yo-Yo Intermittent Recovery (YYIR) test and a repeated shuttle sprint ability (RSA) test at the beginning of pre-season training. The RSA results were evaluated in terms of best time, total time, and fatigue decrement. Heart rates (HR) were assessed at selected times throughout all tests; the recovery HR was measured immediately post-test and 10 minutes later. The match performance score was based on various handball specific parameters (e.g., field goals, assists, steals, blocks, and technical mistakes) as seen during all matches of the immediately subsequent season (2015/2016). The parameters of run 1, run 2, and HR recovery at minutes 6 and 10 of the RSA test all showed a variance of more than 10% (range: 11-15%). However, the variance of scores for the YYIR test was much smaller (range: 1-7%). The resting HR (r2=0.18), HR recovery at minute 10 (r2=0.10), lactate concentration at rest (r2=0.17), recovery of heart rate from 0 to 10 minutes (r2=0.15), and velocity of second throw at first trial (r2=0.37) were the most valid HBCT parameters. Much effort is necessary to assess MPS and to develop valid tests. Speed and the rate of functional recovery seem the best predictors of competitive performance for elite handball players.

  2. Validity of GRE General Test Scores and TOEFL Scores for Graduate Admission to a Technical University in Western Europe

    Science.gov (United States)

    Zimmermann, Judith; von Davier, Alina A.; Buhmann, Joachim M.; Heinimann, Hans R.

    2018-01-01

    Graduate admission has become a critical process in tertiary education, whereby selecting valid admissions instruments is key. This study assessed the validity of Graduate Record Examination (GRE) General Test scores for admission to Master's programmes at a technical university in Europe. We investigated the indicative value of GRE scores for the…

  3. The Formalization of Fairness: Issues in Testing for Measurement Invariance Using Subtest Scores

    Science.gov (United States)

    Molenaar, Dylan; Borsboom, Denny

    2013-01-01

    Measurement invariance is an important prerequisite for the adequate comparison of group differences in test scores. In psychology, measurement invariance is typically investigated by means of linear factor analyses of subtest scores. These subtest scores typically result from summing the item scores. In this paper, we discuss 4 possible problems…

  4. Explaining the black-white gap in cognitive test scores: Toward a theory of adverse impact.

    Science.gov (United States)

    Cottrell, Jonathan M; Newman, Daniel A; Roisman, Glenn I

    2015-11-01

    In understanding the causes of adverse impact, a key parameter is the Black-White difference in cognitive test scores. To advance theory on why Black-White cognitive ability/knowledge test score gaps exist, and on how these gaps develop over time, the current article proposes an inductive explanatory model derived from past empirical findings. According to this theoretical model, Black-White group mean differences in cognitive test scores arise from the following racially disparate conditions: family income, maternal education, maternal verbal ability/knowledge, learning materials in the home, parenting factors (maternal sensitivity, maternal warmth and acceptance, and safe physical environment), child birth order, and child birth weight. Results from a 5-wave longitudinal growth model estimated on children in the NICHD Study of Early Child Care and Youth Development from ages 4 through 15 years show significant Black-White cognitive test score gaps throughout early development that did not grow significantly over time (i.e., significant intercept differences, but not slope differences). Importantly, the racially disparate conditions listed above can account for the relation between race and cognitive test scores. We propose a parsimonious 3-Step Model that explains how cognitive test score gaps arise, in which race relates to maternal disadvantage, which in turn relates to parenting factors, which in turn relate to cognitive test scores. This model and results offer to fill a need for theory on the etiology of the Black-White ethnic group gap in cognitive test scores, and attempt to address a missing link in the theory of adverse impact. (c) 2015 APA, all rights reserved).

  5. The Effects of Teacher and Teacher-librarian High-end Collaboration on Inquiry-based Project Reports and School Monthly Test Scores of Fifth-grade Students

    Directory of Open Access Journals (Sweden)

    Hai-Hon Chen

    2015-07-01

    Full Text Available The purpose of this study was twofold. The first purpose was to establish the high level collaboration of integrated instruction model between social studies teacher and teacher-librarian. The second purpose was to investigate the effects of high-end collaboration on the individual and groups’ inquiry-based project reports, as well as monthly test scores of fifth-grade students. A quasi-experimental method was adopted, two classes of elementary school fifth graders in Tainan Municipal city, Taiwan were used as samples. Students were randomly assigned to experimental conditions by class. Twenty eight students of the experimental group were taught by the collaboration of social studies teacher and teacher-librarian; while 27 students of the controlled group were taught separately by teacher in didactic teaching method. Inquiry-Based Project Record, Inquiry-Based Project Rubrics, and school monthly test scores were used as instruments for collecting data. A t-test and correlation were used to analyze the data. The results indicate that: (1 High-end collaboration model between social studies teacher and teacher-librarian was established and implemented well in the classroom. (2There was a significant difference between the experimental group and the controlled group in individual and groups’ inquiry-based project reports. Students that were taught by the collaborative teachers got both higher inquiry-based project reports’ scores than those that were taught separately by the teachers. Experimental group’s students got higher school monthly test scores than controlled groups. Suggestions for teachers’ high-end collaboration and future researcher are provided in this paper.

  6. Performances on Rey Auditory Verbal Learning Test and Rey Complex Figure Test in a healthy, elderly Danish sample--reference data and validity issues

    DEFF Research Database (Denmark)

    Vogel, Asmus; Stokholm, Jette; Jørgensen, Kasper

    2012-01-01

    . The RCFT copy score was significantly related to age and the DART score. On RCFT recall a highly significant difference was found between persons who could make a faultless copy and persons with incomplete copy performance. Thus, this study presents separate data for RCFT recall scores according...... to the subjects' copying performance (in separate tables for age and education groups). For all measures on both RAVLT and RCFT wide distributions of scores were found and the impact of this broad score range on the tests' discriminative validity is discussed. RAVLT performances for elderly were similar...... to previous published meta-norms, but the included sample of elderly Danes performed better on RCFT (copy and recall) than elderly from the United States....

  7. Psychometric Properties of Raw and Scale Scores on Mixed-Format Tests

    Science.gov (United States)

    Kolen, Michael J.; Lee, Won-Chan

    2011-01-01

    This paper illustrates that the psychometric properties of scores and scales that are used with mixed-format educational tests can impact the use and interpretation of the scores that are reported to examinees. Psychometric properties that include reliability and conditional standard errors of measurement are considered in this paper. The focus is…

  8. Operability test procedure for PFP wastewater sampling facility

    International Nuclear Information System (INIS)

    Hirzel, D.R.

    1995-01-01

    Document provides instructions for performing the Operability Test of the 225-WC Wastewater Sampling Station which monitors the discharge to the Treated Effluent Disposal Facility from the Plutonium Finishing Plant. This Operability Test Procedure (OTP) has been prepared to verify correct configuration and performance of the PFP Wastewater sampling system installed in Building 225-WC located outside the perimeter fence southeast of the Plutonium Finishing Plant (PFP). The objective of this test is to ensure the equipment in the sampling facility operates in a safe and reliable manner. The sampler consists of two Manning Model S-5000 units which are rate controlled by the Milltronics Ultrasonic flowmeter at manhole No.C4 and from a pH measuring system with the sensor in the stream adjacent to the sample point. The intent of the dual sampling system is to utilize one unit to sample continuously at a rate proportional to the wastewater flow rate so that the aggregate tests are related to the overall flow and thereby eliminate isolated analyses. The second unit will only operate during a high or low pH excursion of the stream (hence the need for a pH control). The major items in this OTP include testing of the Manning Sampler System and associated equipment including the pH measuring and control system, the conductivity monitor, and the flow meter

  9. Parent Ratings of Impulsivity and Inhibition Predict State Testing Scores

    Directory of Open Access Journals (Sweden)

    Rebecca A. Lundwall

    2018-03-01

    Full Text Available One principle of cognitive development is that earlier intervention for educational difficulties tends to improve outcomes such as future educational and career success. One possible way to help students who struggle is to determine if they process information differently. Such determination might lead to clues for interventions. For example, early information processing requires attention before the information can be identified, encoded, and stored. The aim of the present study was to investigate whether parent ratings of inattention, inhibition, and impulsivity, and whether error rate on a reflexive attention task could be used to predict child scores on state standardized tests. Finding such an association could provide assistance to educators in identifying academically struggling children who might require targeted educational interventions. Children (N = 203 were invited to complete a peripheral cueing task (which measures the automatic reorienting of the brain’s attentional resources from one location to another. While the children completed the task, their parents completed a questionnaire. The questionnaire gathered information on broad indicators of child functioning, including observable behaviors of impulsivity, inattention, and inhibition, as well as state academic scores (which the parent retrieved online from their school. We used sequential regression to analyze contributions of error rate and parent-rated behaviors in predicting six academic scores. In one of the six analyses (for science, we found that the improvement was significant from the simplified model (with only family income, child age, and sex as predictors to the full model (adding error rate and three parent-rated behaviors. Two additional analyses (reading and social studies showed near significant improvement from simplified to full models. Parent-rated behaviors were significant predictors in all three of these analyses. In the reading score analysis

  10. Tale of Two Patent Ductus Arteriosus Severity Scores: Similarities and Differences.

    Science.gov (United States)

    Fink, Daniel; El-Khuffash, Afif; McNamara, Patrick J; Nitzan, Itamar; Hammerman, Cathy

    2018-01-01

     Several echocardiographic scoring systems have been developed to assess the severity of patent ductus arteriosus (PDA) shunting in preterm infants.  The objective of this study was to compare the ability of two different scoring systems to evaluate the hemodynamic significance of the PDA and to predict long-term PDA-associated morbidities.  El-Khuffash cohort (previously described) was derived from a multicenter, prospective, observational study conducted in tertiary neonatal intensive care units in Ireland, Canada, and Australia.  A total of 141 infants with a mean gestational age of 26 ± 1.4 weeks and a mean birth weight of 952 ± 235 g were evaluated on day 2 of life. The two scores were well correlated with each other and both scores positively predicted chronic lung disease/death in this population.  There appears to be an overall stepwise progression in the incidence of poor outcome parameters from "closed" to "borderline" to "hemodynamically significant" PDA. Both the El-Khuffash and Shaare Zedek scores are predictive of PDA-associated morbidities. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.

  11. The Apgar score has survived the test of time.

    Science.gov (United States)

    Finster, Mieczyslaw; Wood, Margaret

    2005-04-01

    In 1953, Virginia Apgar, M.D. published her proposal for a new method of evaluation of the newborn infant. The avowed purpose of this paper was to establish a simple and clear classification of newborn infants which can be used to compare the results of obstetric practices, types of maternal pain relief and the results of resuscitation. Having considered several objective signs pertaining to the condition of the infant at birth she selected five that could be evaluated and taught to the delivery room personnel without difficulty. These signs were heart rate, respiratory effort, reflex irritability, muscle tone and color. Sixty seconds after the complete birth of the baby a rating of zero, one or two was given to each sign, depending on whether it was absent or present. Virginia Apgar reviewed anesthesia records of 1025 infants born alive at Columbia Presbyterian Medical Center during the period of this report. All had been rated by her method. Infants in poor condition scored 0-2, infants in fair condition scored 3-7, while scores 8-10 were achieved by infants in good condition. The most favorable score 1 min after birth was obtained by infants delivered vaginally with the occiput the presenting part (average 8.4). Newborns delivered by version and breech extraction had the lowest score (average 6.3). Infants delivered by cesarean section were more vigorous (average score 8.0) when spinal was the method of anesthesia versus an average score of 5.0 when general anesthesia was used. Correlating the 60 s score with neonatal mortality, Virginia found that mature infants receiving 0, 1 or 2 scores had a neonatal death rate of 14%; those scoring 3, 4, 5, 6 or 7 had a death rate of 1.1%; and those in the 8-10 score group had a death rate of 0.13%. She concluded that the prognosis of an infant is excellent if he receives one of the upper three scores, and poor if one of the lowest three scores.

  12. Effects of Classroom Ventilation Rate and Temperature on Students' Test Scores.

    Science.gov (United States)

    Haverinen-Shaughnessy, Ulla; Shaughnessy, Richard J

    2015-01-01

    Using a multilevel approach, we estimated the effects of classroom ventilation rate and temperature on academic achievement. The analysis is based on measurement data from a 70 elementary school district (140 fifth grade classrooms) from Southwestern United States, and student level data (N = 3109) on socioeconomic variables and standardized test scores. There was a statistically significant association between ventilation rates and mathematics scores, and it was stronger when the six classrooms with high ventilation rates that were indicated as outliers were filtered (> 7.1 l/s per person). The association remained significant when prior year test scores were included in the model, resulting in less unexplained variability. Students' mean mathematics scores (average 2286 points) were increased by up to eleven points (0.5%) per each liter per second per person increase in ventilation rate within the range of 0.9-7.1 l/s per person (estimated effect size 74 points). There was an additional increase of 12-13 points per each 1°C decrease in temperature within the observed range of 20-25°C (estimated effect size 67 points). Effects of similar magnitude but higher variability were observed for reading and science scores. In conclusion, maintaining adequate ventilation and thermal comfort in classrooms could significantly improve academic achievement of students.

  13. Refining Ovarian Cancer Test accuracy Scores (ROCkeTS): protocol for a prospective longitudinal test accuracy study to validate new risk scores in women with symptoms of suspected ovarian cancer.

    Science.gov (United States)

    Sundar, Sudha; Rick, Caroline; Dowling, Francis; Au, Pui; Snell, Kym; Rai, Nirmala; Champaneria, Rita; Stobart, Hilary; Neal, Richard; Davenport, Clare; Mallett, Susan; Sutton, Andrew; Kehoe, Sean; Timmerman, Dirk; Bourne, Tom; Van Calster, Ben; Gentry-Maharaj, Aleksandra; Menon, Usha; Deeks, Jon

    2016-08-09

    Ovarian cancer (OC) is associated with non-specific symptoms such as bloating, making accurate diagnosis challenging: only 1 in 3 women with OC presents through primary care referral. National Institute for Health and Care Excellence guidelines recommends sequential testing with CA125 and routine ultrasound in primary care. However, these diagnostic tests have limited sensitivity or specificity. Improving accurate triage in women with vague symptoms is likely to improve mortality by streamlining referral and care pathways. The Refining Ovarian Cancer Test Accuracy Scores (ROCkeTS; HTA 13/13/01) project will derive and validate new tests/risk prediction models that estimate the probability of having OC in women with symptoms. This protocol refers to the prospective study only (phase III). ROCkeTS comprises four parallel phases. The full ROCkeTS protocol can be found at http://www.birmingham.ac.uk/ROCKETS. Phase III is a prospective test accuracy study. The study will recruit 2450 patients from 15 UK sites. Recruited patients complete symptom and anxiety questionnaires, donate a serum sample and undergo ultrasound scored as per International Ovarian Tumour Analysis (IOTA) criteria. Recruitment is at rapid access clinics, emergency departments and elective clinics. Models to be evaluated include those based on ultrasound derived by the IOTA group and novel models derived from analysis of existing data sets. Estimates of sensitivity, specificity, c-statistic (area under receiver operating curve), positive predictive value and negative predictive value of diagnostic tests are evaluated and a calibration plot for models will be presented. ROCkeTS has received ethical approval from the NHS West Midlands REC (14/WM/1241) and is registered on the controlled trials website (ISRCTN17160843) and the National Institute of Health Research Cancer and Reproductive Health portfolios. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted

  14. Sampling analytical tests and destructive tests for quality assurance

    International Nuclear Information System (INIS)

    Saas, A.; Pasquini, S.; Jouan, A.; Angelis, de; Hreen Taywood, H.; Odoj, R.

    1990-01-01

    In the context of the third programme of the European Communities on the monitoring of radioactive waste, various methods have been developed for the performance of sampling and measuring tests on encapsulated waste of low and medium level activity, on the one hand, and of high level activity, on the other hand. The purpose was to provide better quality assurance for products to be stored on an interim or long-term basis. Various testing sampling means are proposed such as: - sampling of raw waste before conditioning and determination of the representative aliquot, - sampling of encapsulated waste on process output, - sampling of core specimens subjected to measurement before and after cutting. Equipment suitable for these sampling procedures have been developed and, in the case of core samples, a comparison of techniques has been made. The results are described for the various analytical tests carried out on the samples such as: - mechanical tests, - radiation resistance, - fire resistance, - lixiviation, - determination of free water, - biodegradation, - water resistance, - chemical and radiochemical analysis. Every time it was possible, these tests were compared with non-destructive tests on full-scale packages and some correlations are given. This word has made if possible to improve and clarify sample optimization, with fine sampling techniques and methodologies and draw up characterization procedures. It also provided an occasion for a first collaboration between the laboratories responsible for these studies and which will be furthered in the scope of the 1990-1994 programme

  15. Test-retest reliability and minimal detectable change scores for sit-to-stand-to-sit tests, the six-minute walk test, the one-leg heel-rise test, and handgrip strength in people undergoing hemodialysis.

    Science.gov (United States)

    Segura-Ortí, Eva; Martínez-Olmos, Francisco José

    2011-08-01

    Determining the relative and absolute reliability of outcomes of physical performance tests for people undergoing hemodialysis is necessary to discriminate between the true effects of exercise interventions and the inherent variability of this cohort. The aims of this study were to assess the relative reliability of sit-to-stand-to-sit tests (the STS-10, which measures the time [in seconds] required to complete 10 full stands from a sitting position, and the STS-60, which measures the number of repetitions achieved in 60 seconds), the Six-Minute Walk Test (6MWT), the one-leg heel-rise test, and the handgrip strength test and to calculate minimal detectable change (MDC) scores in people undergoing hemodialysis. This study was a prospective, nonexperimental investigation. Thirty-nine people undergoing hemodialysis at 2 clinics in Spain were contacted. Study participants performed the STS-10 (n=37), the STS-60 (n=37), and the 6MWT (n=36). At one of the settings, the participants also performed the one-leg heel-rise test (n=21) and the handgrip strength test (n=12) on both the right and the left sides. Participants attended 2 testing sessions 1 to 2 weeks apart. High intraclass correlation coefficients (≥.88) were found for all tests, suggesting good relative reliability. The MDC scores at 90% confidence intervals were as follows: 8.4 seconds for the STS-10, 4 repetitions for the STS-60, 66.3 m for the 6MWT, 3.4 kg for handgrip strength (force-generating capacity), 3.7 repetitions for the one-leg heel-rise test with the right leg, and 5.2 repetitions for the one-leg heel-rise test with the left leg. Limitations A limited sample of patients was used in this study. The STS-16, STS-60, 6MWT, one-leg heel rise test, and handgrip strength test are reliable outcome measures. The MDC scores at 90% confidence intervals for these tests will help to determine whether a change is due to error or to an intervention.

  16. Comparing the Scoring of Human Decomposition from Digital Images to Scoring Using On-site Observations.

    Science.gov (United States)

    Dabbs, Gretchen R; Bytheway, Joan A; Connor, Melissa

    2017-09-01

    When in forensic casework or empirical research in-person assessment of human decomposition is not possible, the sensible substitution is color photographic images. To date, no research has confirmed the utility of color photographic images as a proxy for in situ observation of the level of decomposition. Sixteen observers scored photographs of 13 human cadavers in varying decomposition stages (PMI 2-186 days) using the Total Body Score system (total n = 929 observations). The on-site TBS was compared with recorded observations from digital color images using a paired samples t-test. The average difference between on-site and photographic observations was -0.20 (t = -1.679, df = 928, p = 0.094). Individually, only two observers, both students with human decomposition based on digital images can be substituted for assessments based on observation of the corpse in situ, when necessary. © 2017 American Academy of Forensic Sciences.

  17. A confirmatory test of the underlying factor structure of scores on the collective self-esteem scale in two independent samples of Black Americans.

    Science.gov (United States)

    Utsey, Shawn O; Constantine, Madonna G

    2006-04-01

    In this study, we examined the factor structure of the Collective Self-Esteem Scale (CSES; Luhtanen & Crocker, 1992) across 2 separate samples of Black Americans. The CSES was administered to a sample of Black American adolescents (n = 538) and a community sample of Black American adults (n = 313). Results of confirmatory factor analyses (CFAs), however, did not support the original 4-factor model identified by Luhtanen and Crocker (1992) as providing an adequate fit to the data for these samples. Furthermore, an exploratory CFA procedure failed to find a CSES factor structure that could be replicated across the 2 samples of Black Americans. We present and discuss implications of the findings.

  18. A boundary-optimized rejection region test for the two-sample binomial problem.

    Science.gov (United States)

    Gabriel, Erin E; Nason, Martha; Fay, Michael P; Follmann, Dean A

    2018-03-30

    Testing the equality of 2 proportions for a control group versus a treatment group is a well-researched statistical problem. In some settings, there may be strong historical data that allow one to reliably expect that the control proportion is one, or nearly so. While one-sample tests or comparisons to historical controls could be used, neither can rigorously control the type I error rate in the event the true control rate changes. In this work, we propose an unconditional exact test that exploits the historical information while controlling the type I error rate. We sequentially construct a rejection region by first maximizing the rejection region in the space where all controls have an event, subject to the constraint that our type I error rate does not exceed α for any true event rate; then with any remaining α we maximize the additional rejection region in the space where one control avoids the event, and so on. When the true control event rate is one, our test is the most powerful nonrandomized test for all points in the alternative space. When the true control event rate is nearly one, we demonstrate that our test has equal or higher mean power, averaging over the alternative space, than a variety of well-known tests. For the comparison of 4 controls and 4 treated subjects, our proposed test has higher power than all comparator tests. We demonstrate the properties of our proposed test by simulation and use our method to design a malaria vaccine trial. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.

  19. High Test Scores: The Wrong Road to National Economic Success

    Science.gov (United States)

    Baker, Keith

    2011-01-01

    A widely held view is that good schools are essential to a nation's international economic success and that high test scores on international tests of academic skills and knowledge indicate how good a nation's schools are. The widespread belief that good schools are an important contributor to a nation's economic success in the world is supported…

  20. Relationships between spatial activities and scores on the mental rotation test as a function of sex.

    Science.gov (United States)

    Ginn, Sheryl R; Pickens, Stefanie J

    2005-06-01

    Previous results suggested that female college students' scores on the Mental Rotations Test might be related to their prior experience with spatial tasks. For example, women who played video games scored better on the test than their non-game-playing peers, whereas playing video games was not related to men's scores. The present study examined whether participation in different types of spatial activities would be related to women's performance on the Mental Rotations Test. 31 men and 59 women enrolled at a small, private church-affiliated university and majoring in art or music as well as students who participated in intercollegiate athletics completed the Mental Rotations Test. Women's scores on the Mental Rotations Test benefitted from experience with spatial activities; the more types of experience the women had, the better their scores. Thus women who were athletes, musicians, or artists scored better than those women who had no experience with these activities. The opposite results were found for the men. Efforts are currently underway to assess how length of experience and which types of experience are related to scores.

  1. Effects of Classroom Ventilation Rate and Temperature on Students' Test Scores.

    Directory of Open Access Journals (Sweden)

    Ulla Haverinen-Shaughnessy

    Full Text Available Using a multilevel approach, we estimated the effects of classroom ventilation rate and temperature on academic achievement. The analysis is based on measurement data from a 70 elementary school district (140 fifth grade classrooms from Southwestern United States, and student level data (N = 3109 on socioeconomic variables and standardized test scores. There was a statistically significant association between ventilation rates and mathematics scores, and it was stronger when the six classrooms with high ventilation rates that were indicated as outliers were filtered (> 7.1 l/s per person. The association remained significant when prior year test scores were included in the model, resulting in less unexplained variability. Students' mean mathematics scores (average 2286 points were increased by up to eleven points (0.5% per each liter per second per person increase in ventilation rate within the range of 0.9-7.1 l/s per person (estimated effect size 74 points. There was an additional increase of 12-13 points per each 1°C decrease in temperature within the observed range of 20-25°C (estimated effect size 67 points. Effects of similar magnitude but higher variability were observed for reading and science scores. In conclusion, maintaining adequate ventilation and thermal comfort in classrooms could significantly improve academic achievement of students.

  2. Effects of Classroom Ventilation Rate and Temperature on Students’ Test Scores

    Science.gov (United States)

    2015-01-01

    Using a multilevel approach, we estimated the effects of classroom ventilation rate and temperature on academic achievement. The analysis is based on measurement data from a 70 elementary school district (140 fifth grade classrooms) from Southwestern United States, and student level data (N = 3109) on socioeconomic variables and standardized test scores. There was a statistically significant association between ventilation rates and mathematics scores, and it was stronger when the six classrooms with high ventilation rates that were indicated as outliers were filtered (> 7.1 l/s per person). The association remained significant when prior year test scores were included in the model, resulting in less unexplained variability. Students’ mean mathematics scores (average 2286 points) were increased by up to eleven points (0.5%) per each liter per second per person increase in ventilation rate within the range of 0.9–7.1 l/s per person (estimated effect size 74 points). There was an additional increase of 12–13 points per each 1°C decrease in temperature within the observed range of 20–25°C (estimated effect size 67 points). Effects of similar magnitude but higher variability were observed for reading and science scores. In conclusion, maintaining adequate ventilation and thermal comfort in classrooms could significantly improve academic achievement of students. PMID:26317643

  3. Examining the reliability of ADAS-Cog change scores.

    Science.gov (United States)

    Grochowalski, Joseph H; Liu, Ying; Siedlecki, Karen L

    2016-09-01

    The purpose of this study was to estimate and examine ways to improve the reliability of change scores on the Alzheimer's Disease Assessment Scale, Cognitive Subtest (ADAS-Cog). The sample, provided by the Alzheimer's Disease Neuroimaging Initiative, included individuals with Alzheimer's disease (AD) (n = 153) and individuals with mild cognitive impairment (MCI) (n = 352). All participants were administered the ADAS-Cog at baseline and 1 year, and change scores were calculated as the difference in scores over the 1-year period. Three types of change score reliabilities were estimated using multivariate generalizability. Two methods to increase change score reliability were evaluated: reweighting the subtests of the scale and adding more subtests. Reliability of ADAS-Cog change scores over 1 year was low for both the AD sample (ranging from .53 to .64) and the MCI sample (.39 to .61). Reweighting the change scores from the AD sample improved reliability (.68 to .76), but lengthening provided no useful improvement for either sample. The MCI change scores had low reliability, even with reweighting and adding additional subtests. The ADAS-Cog scores had low reliability for measuring change. Researchers using the ADAS-Cog should estimate and report reliability for their use of the change scores. The ADAS-Cog change scores are not recommended for assessment of meaningful clinical change.

  4. Exploration of analysis methods for diagnostic imaging tests: problems with ROC AUC and confidence scores in CT colonography.

    Science.gov (United States)

    Mallett, Susan; Halligan, Steve; Collins, Gary S; Altman, Doug G

    2014-01-01

    Different methods of evaluating diagnostic performance when comparing diagnostic tests may lead to different results. We compared two such approaches, sensitivity and specificity with area under the Receiver Operating Characteristic Curve (ROC AUC) for the evaluation of CT colonography for the detection of polyps, either with or without computer assisted detection. In a multireader multicase study of 10 readers and 107 cases we compared sensitivity and specificity, using radiological reporting of the presence or absence of polyps, to ROC AUC calculated from confidence scores concerning the presence of polyps. Both methods were assessed against a reference standard. Here we focus on five readers, selected to illustrate issues in design and analysis. We compared diagnostic measures within readers, showing that differences in results are due to statistical methods. Reader performance varied widely depending on whether sensitivity and specificity or ROC AUC was used. There were problems using confidence scores; in assigning scores to all cases; in use of zero scores when no polyps were identified; the bimodal non-normal distribution of scores; fitting ROC curves due to extrapolation beyond the study data; and the undue influence of a few false positive results. Variation due to use of different ROC methods exceeded differences between test results for ROC AUC. The confidence scores recorded in our study violated many assumptions of ROC AUC methods, rendering these methods inappropriate. The problems we identified will apply to other detection studies using confidence scores. We found sensitivity and specificity were a more reliable and clinically appropriate method to compare diagnostic tests.

  5. Clinical use of the ABO-Scoring Index: reliability and subtraction frequency.

    Science.gov (United States)

    Lieber, William S; Carlson, Sean K; Baumrind, Sheldon; Poulton, Donald R

    2003-10-01

    This study tested the reliability and subtraction frequency of the study model-scoring system of the American Board of Orthodontists (ABO). We used a sample of 36 posttreatment study models that were selected randomly from six different orthodontic offices. Intrajudge and interjudge reliability was calculated using nonparametric statistics (Spearman rank coefficient, Wilcoxon, Kruskal-Wallis, and Mann-Whitney tests). We found differences ranging from 3 to 6 subtraction points (total score) for intrajudge scoring between two sessions. For overall total ABO score, the average correlation was .77. Intrajudge correlation was greatest for occlusal relationships and least for interproximal contacts. Interjudge correlation for ABO score averaged r = .85. Correlation was greatest for buccolingual inclination and least for overjet. The data show that some judges, on average, were much more lenient than others and that this resulted in a range of total scores between 19.7 and 27.5. Most of the deductions were found in the buccal segments and most were related to the second molars. We present these findings in the context of clinicians preparing for the ABO phase III examination and for orthodontists in their ongoing evaluation of clinical results.

  6. Advantages of micronuclei analysis through images autocapturing and screen scoring

    International Nuclear Information System (INIS)

    González, J.E.; Martínez-López, W.

    2015-01-01

    The cytokinesis-block micronucleus (CBMN) test is a quantitative assay for genetic toxicity assessment. One of the advantages of the MN assay is its amenability for automation. Different type of cells has been used to evaluate genetic damage through MN assay, such as, human lymphocytes and rodent cell lines (i.e. CHO, V79, CHL and L5178Y). The MN quantification is a time consuming process and several efforts has been conducted for its automation. Some of them include an operator checking step, like PathFinder CellScan System, or are fully automated such as MNScore from MetaSytems. Usually, fully automated systems detect two or three times less MN than visual scoring. In some cases, the impact of false positive detection is reduced with a visual detection step. In the present work we have tested a combination of image autocapturing of CHOK1 cells previously treated with bleomycin (0, 2.5, 5.0 and 10.0 μg/ml) or UVC (0, 4, 8 and 16 J/m”2 ) with a screen scoring. Capturing images using the AutoCapture option from Metafer 4 from MetaSystems (GmbH, Germany) plus screen scoring render similar results in terms of MN cells frequency than microscopic live scoring. The resultant bias from the Bland–Altman analysis was -1.1% with confidence intervals between -2.2% and -0.1%, indicating an acceptable agreement between both MN scoring method. However, the mean time devoted to live microscope scoring per sample was 159 minutes compared to 39 minutes for microscope images autocapturing and screen scoring. Therefore, it become advantageous to combine autocapturing of microscope images plus screen scoring when many samples have to be analyzed for radiological biodosimetry purposes. (authors)

  7. Estimation of sample size and testing power (Part 4).

    Science.gov (United States)

    Hu, Liang-ping; Bao, Xiao-lei; Guan, Xue; Zhou, Shi-guo

    2012-01-01

    Sample size estimation is necessary for any experimental or survey research. An appropriate estimation of sample size based on known information and statistical knowledge is of great significance. This article introduces methods of sample size estimation of difference test for data with the design of one factor with two levels, including sample size estimation formulas and realization based on the formulas and the POWER procedure of SAS software for quantitative data and qualitative data with the design of one factor with two levels. In addition, this article presents examples for analysis, which will play a leading role for researchers to implement the repetition principle during the research design phase.

  8. A knowledge-based theory of rising scores on "culture-free" tests.

    Science.gov (United States)

    Fox, Mark C; Mitchum, Ainsley L

    2013-08-01

    Secular gains in intelligence test scores have perplexed researchers since they were documented by Flynn (1984, 1987). Gains are most pronounced on abstract, so-called culture-free tests, prompting Flynn (2007) to attribute them to problem-solving skills availed by scientifically advanced cultures. We propose that recent-born individuals have adopted an approach to analogy that enables them to infer higher level relations requiring roles that are not intrinsic to the objects that constitute initial representations of items. This proposal is translated into item-specific predictions about differences between cohorts in pass rates and item-response patterns on the Raven's Matrices (Flynn, 1987), a seemingly culture-free test that registers the largest Flynn effect. Consistent with predictions, archival data reveal that individuals born around 1940 are less able to map objects at higher levels of relational abstraction than individuals born around 1990. Polytomous Rasch models verify predicted violations of measurement invariance, as raw scores are found to underestimate the number of analogical rules inferred by members of the earlier cohort relative to members of the later cohort who achieve the same overall score. The work provides a plausible cognitive account of the Flynn effect, furthers understanding of the cognition of matrix reasoning, and underscores the need to consider how test-takers select item responses. PsycINFO Database Record (c) 2013 APA, all rights reserved.

  9. A Latent Class Approach to Estimating Test-Score Reliability

    Science.gov (United States)

    van der Ark, L. Andries; van der Palm, Daniel W.; Sijtsma, Klaas

    2011-01-01

    This study presents a general framework for single-administration reliability methods, such as Cronbach's alpha, Guttman's lambda-2, and method MS. This general framework was used to derive a new approach to estimating test-score reliability by means of the unrestricted latent class model. This new approach is the latent class reliability…

  10. Computerized scoring algorithms for the Autobiographical Memory Test.

    Science.gov (United States)

    Takano, Keisuke; Gutenbrunner, Charlotte; Martens, Kris; Salmon, Karen; Raes, Filip

    2018-02-01

    Reduced specificity of autobiographical memories is a hallmark of depressive cognition. Autobiographical memory (AM) specificity is typically measured by the Autobiographical Memory Test (AMT), in which respondents are asked to describe personal memories in response to emotional cue words. Due to this free descriptive responding format, the AMT relies on experts' hand scoring for subsequent statistical analyses. This manual coding potentially impedes research activities in big data analytics such as large epidemiological studies. Here, we propose computerized algorithms to automatically score AM specificity for the Dutch (adult participants) and English (youth participants) versions of the AMT by using natural language processing and machine learning techniques. The algorithms showed reliable performances in discriminating specific and nonspecific (e.g., overgeneralized) autobiographical memories in independent testing data sets (area under the receiver operating characteristic curve > .90). Furthermore, outcome values of the algorithms (i.e., decision values of support vector machines) showed a gradient across similar (e.g., specific and extended memories) and different (e.g., specific memory and semantic associates) categories of AMT responses, suggesting that, for both adults and youth, the algorithms well capture the extent to which a memory has features of specific memories. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  11. Your move: The effect of chess on mathematics test scores.

    Science.gov (United States)

    Rosholm, Michael; Mikkelsen, Mai Bjørnskov; Gumede, Kamilla

    2017-01-01

    We analyse the effect of substituting a weekly mathematics lesson in primary school grades 1-3 with a lesson in mathematics based on chess instruction. We use data from the City of Aarhus in Denmark, combining test score data with a comprehensive data set obtained from administrative registers. We use two different methodological approaches to identify and estimate treatment effects and we tend to find positive effects, indicating that knowledge acquired through chess play can be transferred to the domain of mathematics. We also find larger impacts for unhappy children and children who are bored in school, perhaps because chess instruction facilitates learning by providing an alternative approach to mathematics for these children. The results are encouraging and suggest that chess may be an important and effective tool for improving mathematical capacity in young students.

  12. Dielectric sample with two-layer charge distribution for space charge calibration purposes

    DEFF Research Database (Denmark)

    Holbøll, Joachim; Henriksen, Mogens; Rasmussen, C.

    2002-01-01

    In the present paper is described a dielectric test sample with two very narrow concentrations of bulk charges, achieved by two internal electrodes not affecting the acoustical properties of the sample, a fact important for optimal application of most space charge measuring systems. Space charge...

  13. Statistical energy as a tool for binning-free, multivariate goodness-of-fit tests, two-sample comparison and unfolding

    International Nuclear Information System (INIS)

    Aslan, B.; Zech, G.

    2005-01-01

    We introduce the novel concept of statistical energy as a statistical tool. We define statistical energy of statistical distributions in a similar way as for electric charge distributions. Charges of opposite sign are in a state of minimum energy if they are equally distributed. This property is used to check whether two samples belong to the same parent distribution, to define goodness-of-fit tests and to unfold distributions distorted by measurement. The approach is binning-free and especially powerful in multidimensional applications

  14. Evaluation of the Discrepancy between the European Pharmacopoeia Test and an Adopted United States Pharmacopoeia Test Regarding the Weight Uniformity of Scored Tablet Halves: Is Harmonization Required?

    Science.gov (United States)

    Zaid, Abdel Naser; Ghoush, Abeer Abu; Al-Ramahi, Rowa'; Are'r, Mohammed

    2012-01-01

    The aim of this study was to evaluate whether there exists any discrepancy between the European Pharmacopoeia (Ph. Eur.) and adopted United States Pharmacopeia (USP) tests concerning the weight uniformity measurements of tablet halves after splitting. The USP method does not contain provisions to evaluate split tablets, so here we adopt their whole tablet weight uniformity method. Twenty-nine different commercial scored tablets (local and imported) were divided. The split units were individually weighed and the relative standard deviation (RSD) for each product was calculated and then evaluated according to both the adopted USP and the Ph. Eur. tests of weight uniformity. Twenty out of the 29 products tested failed the USP test, while 14 of them failed the Ph. Eur. test. Nine products passed both the USP and Ph. Eur. tests. Six products passed the Ph. Eur. test but failed the USP test, with all of these products having an RSD greater than 6%. The correlation coefficient between the weight and content of split halves for three randomly selected products-corotenol 100 mg, corotenol 50 mg, and lorazepam 2.5 mg-was found to be 0.986, 0.998, and 0.72, respectively. A clear difference can be seen between outcomes obtained by the two compendial tablet splitting methods with regard to weight uniformity. Results from the USP test showed that tighter measures are needed to pass the test. Our results argue that the Ph. Eur. should revise the existing weight uniformity test on scored tablets to include the RSD parameter in it. The USP should include this adopted test as a specific test for scored tablet halves, not just whole tablets. Manufacturers in some cases will need to improve the quality of the produced scored tablets in order to pass the USP test, especially those with low therapeutic indices. Finally, harmonization between the pharmacopoeias regarding the weight uniformity testing of split tablets is warranted. The aim of this study was to evaluate whether there

  15. America's Mediocre Test Scores: Education Crisis or Poverty Crisis?

    Science.gov (United States)

    Petrilli, Michael J.; Wright, Brandon L.

    2016-01-01

    At a time when the national conversation is focused on lagging upward mobility, it is no surprise that many educators point to poverty as the explanation for mediocre test scores among U.S. students compared to those of students in other countries. If American teachers in struggling U.S. schools taught in Finland, says Finnish educator Pasi…

  16. Comparison of physical therapy anatomy performance and anxiety scores in timed and untimed practical tests.

    Science.gov (United States)

    Schwartz, Sarah M; Evans, Cathy; Agur, Anne M R

    2015-01-01

    Students in health care professional programs face many stressful tests that determine successful completion of their program. Test anxiety during these high stakes examinations can affect working memory and lead to poor outcomes. Methods of decreasing test anxiety include lengthening the time available to complete examinations or evaluating students using untimed examinations. There is currently no consensus in the literature regarding whether untimed examinations provide a benefit to test performance in clinical anatomy. This study aimed to determine the impact of timed versus untimed practical tests on Master of Physical Therapy student anatomy performance and test anxiety. Test anxiety was measured using the State-Trait Anxiety Inventory (STAI). Differences in performance, anxiety scores, and time taken were compared using paired sample Student's t-tests. Eighty-one of the 84 students completed the study and provided feedback. Students performed significantly higher on the untimed test (P = 0.005), with a significant reduction in test anxiety (P anxiety. If the intended goal of evaluating health care professional students is to determine fundamental competencies, these factors should be considered when designing future curricula. © 2014 American Association of Anatomists.

  17. The Implications of Family Size and Birth Order for Test Scores and Behavioral Development

    Science.gov (United States)

    Silles, Mary A.

    2010-01-01

    This article, using longitudinal data from the National Child Development Study, presents new evidence on the effects of family size and birth order on test scores and behavioral development at age 7, 11 and 16. Sibling size is shown to have an adverse causal effect on test scores and behavioral development. For any given family size, first-borns…

  18. Estimation of sample size and testing power (Part 3).

    Science.gov (United States)

    Hu, Liang-ping; Bao, Xiao-lei; Guan, Xue; Zhou, Shi-guo

    2011-12-01

    This article introduces the definition and sample size estimation of three special tests (namely, non-inferiority test, equivalence test and superiority test) for qualitative data with the design of one factor with two levels having a binary response variable. Non-inferiority test refers to the research design of which the objective is to verify that the efficacy of the experimental drug is not clinically inferior to that of the positive control drug. Equivalence test refers to the research design of which the objective is to verify that the experimental drug and the control drug have clinically equivalent efficacy. Superiority test refers to the research design of which the objective is to verify that the efficacy of the experimental drug is clinically superior to that of the control drug. By specific examples, this article introduces formulas of sample size estimation for the three special tests, and their SAS realization in detail.

  19. Test--retest variability of Randot stereoacuity measures gathered in an unselected sample of UK primary school children.

    Science.gov (United States)

    Adler, Paul; Scally, Andrew J; Barrett, Brendan T

    2012-05-01

    To determine the test-retest reliability of the Randot stereoacuity test when used as part of vision screening in schools. Randot stereoacuity (graded-circles) and logMAR visual acuity measures were gathered in an unselected sample of 139 children (aged 4-12, mean 8.1±2.1 years) in two schools. Randot testing was repeated on two occasions (average interval between successive tests 8 days, range: 1-21 days). Three Randot scores were obtained in 97.8% of children. Randot stereoacuity improved by an average of one plate (ie, one test level) on repeat testing but was little changed when tested on the third occasion. Within-subject variability was up to three test levels on repeat testing. When stereoacuity was categorised as 'fine', 'intermediate' or 'coarse', the greatest variability was found among younger children who exhibited 'intermediate' or 'coarse'/nil stereopsis on initial testing. Whereas 90.8% of children with 'fine' stereopsis (≤50 arc-seconds) on the first test exhibited 'fine' stereopsis on both subsequent tests, only ∼16% of children with 'intermediate' (>50 but ≤140 arc-seconds) or 'coarse'/nil (≥200 arc-seconds) stereoacuity on initial testing exhibited stable test results on repeat testing. Children exhibiting abnormal stereoacuity on initial testing are very likely to exhibit a normal result when retested. The value of a single, abnormal Randot graded-circles stereoacuity measure from school screening is therefore questionable.

  20. Cost-effectiveness of one versus two sample faecal immunochemical testing for colorectal cancer screening

    NARCIS (Netherlands)

    S.L. Goede (Luuk); A.H.C. Roon (Aafke); J.C.I.Y. Reijerink (Jacqueline); A.J. van Vuuren (Hanneke); I. Lansdorp-Vogelaar (Iris); J.D.F. Habbema (Dik); E.J. Kuipers (Ernst); M.E. van Leerdam (Monique); M. van Ballegooijen (Marjolein)

    2013-01-01

    textabstractObjective The sensitivity and specificity of a single faecal immunochemical test (FIT) are limited. The performance of FIT screening can be improved by increasing the screening frequency or by providing more than one sample in each screening round. This study aimed to evaluate if

  1. Prostate ultrasound imaging: evaluation of a two-step scoring system in the diagnosis of prostate cancer.

    Science.gov (United States)

    Gao, Yong; Liao, Xin-Hong; Ma, Yan; Lu, Lu; Wei, Li-Yan; Yan, Xue

    2017-12-01

    This study aims to investigate the feasibility and performance of a two-step scoring system of ultrasound imaging in the diagnosis of prostate cancer. 75 patients with 888 consecutive histopathologically verified lesions were included in this study. Step 1, an initial 5-point scoring system was developed based on conventional transrectal ultrasound (TRUS). Step 2, a final scoring system was evaluated according to contrast-enhanced transrectal ultrasound (CE-TRUS). Each lesion was evaluated using the two-step scoring system (step 1 + step 2) and compared with only using conventional TRUS (step 1). 888 lesions were histologically verified: 315 of them were prostate cancer from 46 patients and 573 were benign prostatic hypertrophy (BPH) from 29 patients. According to the two-step scoring system, 284 lesions were upgraded and 130 lesions were downgraded from step 1 to step 2 (this means using step 2 to assess the results by step 1). However, 96 cases were improperly upgraded after step 2 and 48 malignant lesions were still missed after step 2 as score-1. For the two-step scoring system, the sensitivity, specificity, and accuracy were 84.7%, 83.2%, and 83.7%, respectively, versus 22.8%, 96.6%, and 70.4%, respectively, for conventional TRUS. The area under the ROC curve (AUC) for lesion diagnosis was 0.799-0.952 for the two-step scoring system, versus 0.479-0.712 for conventional TRUS. The difference in the diagnostic accuracy of the two-step scoring system and conventional TRUS was statistically significant (Pstep scoring system was straightforward to use and achieved a considerably accurate diagnostic performance for prostate cancer. The application of the two-step scoring system for prostate cancer is promising.

  2. Your move: The effect of chess on mathematics test scores

    DEFF Research Database (Denmark)

    Rosholm, Michael; Mikkelsen, Mai Bjørnskov; Gumede, Kamilla Trille

    2017-01-01

    We analyse the effect of substituting a weekly mathematics lesson in primary school grades 1–3 with a lesson in mathematics based on chess instruction. We use data from the City of Aarhus in Denmark, combining test score data with a comprehensive data set obtained from administrative registers. We...... use two different methodological approaches to identify and estimate treatment effects and we tend to find positive effects, indicating that knowledge acquired through chess play can be transferred to the domain of mathematics. We also find larger impacts for unhappy children and children who...... are bored in school, perhaps because chess instruction facilitates learning by providing an alternative approach to mathematics for these children. The results are encouraging and suggest that chess may be an important and effective tool for improving mathematical capacity in young students....

  3. Your move: The effect of chess on mathematics test scores.

    Directory of Open Access Journals (Sweden)

    Michael Rosholm

    Full Text Available We analyse the effect of substituting a weekly mathematics lesson in primary school grades 1-3 with a lesson in mathematics based on chess instruction. We use data from the City of Aarhus in Denmark, combining test score data with a comprehensive data set obtained from administrative registers. We use two different methodological approaches to identify and estimate treatment effects and we tend to find positive effects, indicating that knowledge acquired through chess play can be transferred to the domain of mathematics. We also find larger impacts for unhappy children and children who are bored in school, perhaps because chess instruction facilitates learning by providing an alternative approach to mathematics for these children. The results are encouraging and suggest that chess may be an important and effective tool for improving mathematical capacity in young students.

  4. Agreement and conversion formula between mini-mental state examination and montreal cognitive assessment in an outpatient sample.

    Science.gov (United States)

    Helmi, Luqman; Meagher, David; O'Mahony, Edmond; O'Neill, Donagh; Mulligan, Owen; Murthy, Sutha; McCarthy, Geraldine; Adamis, Dimitrios

    2016-09-22

    To explore the agreement between the mini-mental state examination (MMSE) and montreal cognitive assessment (MoCA) within community dwelling older patients attending an old age psychiatry service and to derive and test a conversion formula between the two scales. Prospective study of consecutive patients attending outpatient services. Both tests were administered by the same researcher on the same day in random order. The total sample (n = 135) was randomly divided into two groups. One to derive a conversion rule (n = 70), and a second (n = 65) in which this rule was tested. The agreement (Pearson's r) of MMSE and MoCA was 0.86 (P < 0.001), and Lin's concordance correlation coefficient (CCC) was 0.57 (95%CI: 0.45-0.66). In the second sample MoCA scores were converted to MMSE scores according to a conversion rule from the first sample which achieved agreement with the original MMSE scores of 0.89 (Pearson's r, P < 0.001) and CCC of 0.88 (95%CI: 0.82-0.92). Although the two scales overlap considerably, the agreement is modest. The conversion rule derived herein demonstrated promising accuracy and warrants further testing in other populations.

  5. Differential Predictive Validity of High School GPA and College Entrance Test Scores for University Students in Yemen

    Science.gov (United States)

    Al-Hattami, Abdulghani Ali Dawod

    2012-01-01

    High school grade point average and college entrance test scores are two admission criteria that are currently used by most colleges in Yemen to select their prospective students. Given their widespread use, it is important to investigate their predictive validity to ensure the accuracy of the admission decisions in these institutions. This study…

  6. Predicting Freshman Grade Point Average From College Admissions Test Scores and State High School Test Scores

    OpenAIRE

    Koretz, Daniel; Yu, C; Mbekeani, Preeya Pandya; Langi, M.; Dhaliwal, Tasminda Kaur; Braslow, David Arthur

    2016-01-01

    The current focus on assessing “college and career readiness” raises an empirical question: How do high school tests compare with college admissions tests in predicting performance in college? We explored this using data from the City University of New York and public colleges in Kentucky. These two systems differ in the choice of college admissions test, the stakes for students on the high school test, and demographics. We predicted freshman grade point average (FGPA) from high school GPA an...

  7. A physical function test for use in the intensive care unit: validity, responsiveness, and predictive utility of the physical function ICU test (scored).

    Science.gov (United States)

    Denehy, Linda; de Morton, Natalie A; Skinner, Elizabeth H; Edbrooke, Lara; Haines, Kimberley; Warrillow, Stephen; Berney, Sue

    2013-12-01

    Several tests have recently been developed to measure changes in patient strength and functional outcomes in the intensive care unit (ICU). The original Physical Function ICU Test (PFIT) demonstrates reliability and sensitivity. The aims of this study were to further develop the original PFIT, to derive an interval score (the PFIT-s), and to test the clinimetric properties of the PFIT-s. A nested cohort study was conducted. One hundred forty-four and 116 participants performed the PFIT at ICU admission and discharge, respectively. Original test components were modified using principal component analysis. Rasch analysis examined the unidimensionality of the PFIT, and an interval score was derived. Correlations tested validity, and multiple regression analyses investigated predictive ability. Responsiveness was assessed using the effect size index (ESI), and the minimal clinically important difference (MCID) was calculated. The shoulder lift component was removed. Unidimensionality of combined admission and discharge PFIT-s scores was confirmed. The PFIT-s displayed moderate convergent validity with the Timed "Up & Go" Test (r=-.60), the Six-Minute Walk Test (r=.41), and the Medical Research Council (MRC) sum score (rho=.49). The ESI of the PFIT-s was 0.82, and the MCID was 1.5 points (interval scale range=0-10). A higher admission PFIT-s score was predictive of: an MRC score of ≥48, increased likelihood of discharge home, reduced likelihood of discharge to inpatient rehabilitation, and reduced acute care hospital length of stay. Scoring of sit-to-stand assistance required is subjective, and cadence cutpoints used may not be generalizable. The PFIT-s is a safe and inexpensive test of physical function with high clinical utility. It is valid, responsive to change, and predictive of key outcomes. It is recommended that the PFIT-s be adopted to test physical function in the ICU.

  8. Do candidate reactions relate to job performance or affect criterion-related validity? A multistudy investigation of relations among reactions, selection test scores, and job performance.

    Science.gov (United States)

    McCarthy, Julie M; Van Iddekinge, Chad H; Lievens, Filip; Kung, Mei-Chuan; Sinar, Evan F; Campion, Michael A

    2013-09-01

    Considerable evidence suggests that how candidates react to selection procedures can affect their test performance and their attitudes toward the hiring organization (e.g., recommending the firm to others). However, very few studies of candidate reactions have examined one of the outcomes organizations care most about: job performance. We attempt to address this gap by developing and testing a conceptual framework that delineates whether and how candidate reactions might influence job performance. We accomplish this objective using data from 4 studies (total N = 6,480), 6 selection procedures (personality tests, job knowledge tests, cognitive ability tests, work samples, situational judgment tests, and a selection inventory), 5 key candidate reactions (anxiety, motivation, belief in tests, self-efficacy, and procedural justice), 2 contexts (industry and education), 3 continents (North America, South America, and Europe), 2 study designs (predictive and concurrent), and 4 occupational areas (medical, sales, customer service, and technological). Consistent with previous research, candidate reactions were related to test scores, and test scores were related to job performance. Further, there was some evidence that reactions affected performance indirectly through their influence on test scores. Finally, in no cases did candidate reactions affect the prediction of job performance by increasing or decreasing the criterion-related validity of test scores. Implications of these findings and avenues for future research are discussed. PsycINFO Database Record (c) 2013 APA, all rights reserved

  9. Gleeble Testing of Tungsten Samples

    Science.gov (United States)

    2013-02-01

    temperature on an Instron load frame with a 222.41 kN (50 kip) load cell . The samples were compressed at the same strain rate as on the Gleeble...ID % RE Initial Density (cm 3 ) Density after Compression (cm 3 ) % Change in Density Test Temperature NT1 0 18.08 18.27 1.06 1000 NT3 0...4.1 Nano-Tungsten The results for the compression of the nano-tungsten samples are shown in tables 2 and 3 and figure 5. During testing, sample NT1

  10. Norm Block Sample Sizes: A Review of 17 Individually Administered Intelligence Tests

    Science.gov (United States)

    Norfolk, Philip A.; Farmer, Ryan L.; Floyd, Randy G.; Woods, Isaac L.; Hawkins, Haley K.; Irby, Sarah M.

    2015-01-01

    The representativeness, recency, and size of norm samples strongly influence the accuracy of inferences drawn from their scores. Inadequate norm samples may lead to inflated or deflated scores for individuals and poorer prediction of developmental and academic outcomes. The purpose of this study was to apply Kranzler and Floyd's method for…

  11. Validation of the Cognition Test Battery for Spaceflight in a Sample of Highly Educated Adults.

    Science.gov (United States)

    Moore, Tyler M; Basner, Mathias; Nasrini, Jad; Hermosillo, Emanuel; Kabadi, Sushila; Roalf, David R; McGuire, Sarah; Ecker, Adrian J; Ruparel, Kosha; Port, Allison M; Jackson, Chad T; Dinges, David F; Gur, Ruben C

    2017-10-01

    Neuropsychological changes that may occur due to the environmental and psychological stressors of prolonged spaceflight motivated the development of the Cognition Test Battery. The battery was designed to assess multiple domains of neurocognitive functions linked to specific brain systems. Tests included in Cognition have been validated, but not in high-performing samples comparable to astronauts, which is an essential step toward ensuring their usefulness in long-duration space missions. We administered Cognition (on laptop and iPad) and the WinSCAT, counterbalanced for order and version, in a sample of 96 subjects (50% women; ages 25-56 yr) with at least a Master's degree in science, technology, engineering, or mathematics (STEM). We assessed the associations of age, sex, and administration device with neurocognitive performance, and compared the scores on the Cognition battery with those of WinSCAT. Confirmatory factor analysis compared the structure of the iPad and laptop administration methods using Wald tests. Age was associated with longer response times (mean β = 0.12) and less accurate (mean β = -0.12) performance, women had longer response times on psychomotor (β = 0.62), emotion recognition (β = 0.30), and visuo-spatial (β = 0.48) tasks, men outperformed women on matrix reasoning (β = -0.34), and performance on an iPad was generally faster (mean β = -0.55). The WinSCAT appeared heavily loaded with tasks requiring executive control, whereas Cognition assessed a larger variety of neurocognitive domains. Overall results supported the interpretation of Cognition scores as measuring their intended constructs in high performing astronaut analog samples.Moore TM, Basner M, Nasrini J, Hermosillo E, Kabadi S, Roalf DR, McGuire S, Ecker AJ, Ruparel K, Port AM, Jackson CT, Dinges DF, Gur RC. Validation of the Cognition Test Battery for spaceflight in a sample of highly educated adults. Aerosp Med Hum Perform. 2017; 88(10):937-946.

  12. Increasing the reliability of the fluid/crystallized difference score from the Kaufman Adolescent and Adult Intelligence Test with reliable component analysis.

    Science.gov (United States)

    Caruso, J C

    2001-06-01

    The unreliability of difference scores is a well documented phenomenon in the social sciences and has led researchers and practitioners to interpret differences cautiously, if at all. In the case of the Kaufman Adult and Adolescent Intelligence Test (KAIT), the unreliability of the difference between the Fluid IQ and the Crystallized IQ is due to the high correlation between the two scales. The consequences of the lack of precision with which differences are identified are wide confidence intervals and unpowerful significance tests (i.e., large differences are required to be declared statistically significant). Reliable component analysis (RCA) was performed on the subtests of the KAIT in order to address these problems. RCA is a new data reduction technique that results in uncorrelated component scores with maximum proportions of reliable variance. Results indicate that the scores defined by RCA have discriminant and convergent validity (with respect to the equally weighted scores) and that differences between the scores, derived from a single testing session, were more reliable than differences derived from equal weighting for each age group (11-14 years, 15-34 years, 35-85+ years). This reliability advantage results in narrower confidence intervals around difference scores and smaller differences required for statistical significance.

  13. The Implementation of Role-Playing Model in Principles of Finance Accounting Learning to Improve Students’ Enjoyment and Students’ Test Scores

    Directory of Open Access Journals (Sweden)

    L. Saptono

    2010-01-01

    Full Text Available This research is a classroom action research. The goal of conducting this research is to improve students’ enjoyment level and their test scores by implementing role-playing method. The research is conducted in Accounting Education Study Program of Sanata Dharma University at odd semester on academic year 2010/2011. The participants were divided into two classes. The first class was the class that got the treatment, while the second class was the control class. The result of the study showed that there was an improvement of students’ enjoyment level and test scores in the class which implemented role-playing method.

  14. Allele-sharing models: LOD scores and accurate linkage tests.

    Science.gov (United States)

    Kong, A; Cox, N J

    1997-11-01

    Starting with a test statistic for linkage analysis based on allele sharing, we propose an associated one-parameter model. Under general missing-data patterns, this model allows exact calculation of likelihood ratios and LOD scores and has been implemented by a simple modification of existing software. Most important, accurate linkage tests can be performed. Using an example, we show that some previously suggested approaches to handling less than perfectly informative data can be unacceptably conservative. Situations in which this model may not perform well are discussed, and an alternative model that requires additional computations is suggested.

  15. Pharmacokinetic and pharmacodynamic effects of two omeprazole formulations on stomach pH and gastric ulcer scores.

    Science.gov (United States)

    Raidal, S L; Andrews, F M; Nielsen, S G; Trope, G

    2017-11-01

    Limited data are available on the relative pharmacokinetics and pharmacodynamics of different omeprazole formulations. To compare pharmacokinetic and pharmacodynamic effects of a novel omeprazole formulation against a currently registered product. Masked 2 period, 2 treatment crossover. Twelve clinically healthy horses were studied over two 6-day treatment periods. Horses were randomly assigned to receive a novel omeprazole paste (Ulcershield: ULS) or a currently registered reference omeprazole product (OMO). Gastric pH was measured continuously for 10 h on the day prior to commencing treatment (Day -1) and after 6 days of oral treatment (Day 5) using in situ antimony pH probes within an indwelling nasogastric tube. Plasma pharmacokinetics were determined on Days 0 and 6. Treatment significantly (Pulcer severity scores (both P = 0.004), with no difference between treatments (P = 0.688). Comparison of mean log area under time-plasma concentration curves demonstrated that, although the lower limit of the 90% confidence interval was within the -20% limit for bioequivalence, the upper limit was exceeded, suggesting that the test product could have greater bioavailability than the reference product. The small sample size, large interhorse plasma omeprazole concentrations, and low bioavailability of omeprazole impacted the sensitivity of the bioequivalence analysis. ULS matched or slightly exceeded OMO plasma concentrations. Both products resulted in equivalent increases in gastric pH, gastric pH profiles and decrease in gastric ulcer scores. Thus, ULS was pharmacodynamically equivalent to OMO and was associated with an equivalent beneficial effect on gastric squamous mucosal ulceration. © 2017 EVJ Ltd.

  16. Comparing the Effects of Elementary Music and Visual Arts Lessons on Standardized Mathematics Test Scores

    Science.gov (United States)

    King, Molly Elizabeth

    2016-01-01

    The purpose of this quantitative, causal-comparative study was to compare the effect elementary music and visual arts lessons had on third through sixth grade standardized mathematics test scores. Inferential statistics were used to compare the differences between test scores of students who took in-school, elementary, music instruction during the…

  17. Optimizing Scoring and Sampling Methods for Assessing Built Neighborhood Environment Quality in Residential Areas

    Science.gov (United States)

    Adu-Brimpong, Joel; Coffey, Nathan; Ayers, Colby; Berrigan, David; Yingling, Leah R.; Thomas, Samantha; Mitchell, Valerie; Ahuja, Chaarushi; Rivers, Joshua; Hartz, Jacob; Powell-Wiley, Tiffany M.

    2017-01-01

    Optimization of existing measurement tools is necessary to explore links between aspects of the neighborhood built environment and health behaviors or outcomes. We evaluate a scoring method for virtual neighborhood audits utilizing the Active Neighborhood Checklist (the Checklist), a neighborhood audit measure, and assess street segment representativeness in low-income neighborhoods. Eighty-two home neighborhoods of Washington, D.C. Cardiovascular Health/Needs Assessment (NCT01927783) participants were audited using Google Street View imagery and the Checklist (five sections with 89 total questions). Twelve street segments per home address were assessed for (1) Land-Use Type; (2) Public Transportation Availability; (3) Street Characteristics; (4) Environment Quality and (5) Sidewalks/Walking/Biking features. Checklist items were scored 0–2 points/question. A combinations algorithm was developed to assess street segments’ representativeness. Spearman correlations were calculated between built environment quality scores and Walk Score®, a validated neighborhood walkability measure. Street segment quality scores ranged 10–47 (Mean = 29.4 ± 6.9) and overall neighborhood quality scores, 172–475 (Mean = 352.3 ± 63.6). Walk scores® ranged 0–91 (Mean = 46.7 ± 26.3). Street segment combinations’ correlation coefficients ranged 0.75–1.0. Significant positive correlations were found between overall neighborhood quality scores, four of the five Checklist subsection scores, and Walk Scores® (r = 0.62, p health behaviors and outcomes. PMID:28282878

  18. Testing Two Nutrient Profiling Models of Labelled Foods and Beverages Marketed in Turkey.

    Science.gov (United States)

    Dikmen, Derya; Kızıl, Mevlüde; Uyar, Muhemmet Fatih; Pekcan, Gülden

    2015-06-01

    The objective of this study was to evaluate the nutrient profile of labelled foods and also understand the application of two international nutrient profiling models of labelled foods and beverages. WXYfm and NRF 9.3 nutrient profiling models were used to evaluate 3,171 labelled foods and beverages of 38 food categories and 500 different brands. According to the WXYfm model, pasta, grains and legumes and frozen foods had the best scores whereas oils had the worst scores. According to the NRF 9.3 model per 100 kcal, the best scores were obtained for frozen foods, grains and legumes and milk products whereas the confectionery foods had the worst scores. According to NRF 9.3 per serving size, grains and legumes had the best scores and flavoured milks had the worst scores. A comparison of WXYfm and NRF 9.3 nutrient profiling models ranked scores showed a high positive correlation (p=0.01). The two nutrient models evaluated yielded similar results. Further studies are needed to test other category specific nutrient profiling models in order to understand how different models behave. Copyright© by the National Institute of Public Health, Prague 2015.

  19. Team performance in resuscitation teams: Comparison and critique of two recently developed scoring tools☆

    Science.gov (United States)

    McKay, Anthony; Walker, Susanna T.; Brett, Stephen J.; Vincent, Charles; Sevdalis, Nick

    2012-01-01

    Background and aim Following high profile errors resulting in patient harm and attracting negative publicity, the healthcare sector has begun to focus on training non-technical teamworking skills as one way of reducing the rate of adverse events. Within the area of resuscitation, two tools have been developed recently aiming to assess these skills – TEAM and OSCAR. The aims of the study reported here were:1.To determine the inter-rater reliability of the tools in assessing performance within the context of resuscitation.2.To correlate scores of the same resuscitation teams episodes using both tools, thereby determining their concurrent validity within the context of resuscitation.3.To carry out a critique of both tools and establish how best each one may be utilised. Methods The study consisted of two phases – reliability assessment; and content comparison, and correlation. Assessments were made by two resuscitation experts, who watched 24 pre-recorded resuscitation simulations, and independently rated team behaviours using both tools. The tools were critically appraised, and correlation between overall score surrogates was assessed. Results Both OSCAR and TEAM achieved high levels of inter-rater reliability (in the form of adequate intra-class coefficients) and minor significant differences between Wilcoxon tests. Comparison of the scores from both tools demonstrated a high degree of correlation (and hence concurrent validity). Finally, critique of each tool highlighted differences in length and complexity. Conclusion Both OSCAR and TEAM can be used to assess resuscitation teams in a simulated environment, with the tools correlating well with one another. We envisage a role for both tools – with TEAM giving a quick, global assessment of the team, but OSCAR enabling more detailed breakdown of the assessment, facilitating feedback, and identifying areas of weakness for future training. PMID:22561464

  20. Clinical score and rapid antigen detection test to guide antibiotic use for sore throats: randomised controlled trial of PRISM (primary care streptococcal management).

    Science.gov (United States)

    Little, Paul; Hobbs, F D Richard; Moore, Michael; Mant, David; Williamson, Ian; McNulty, Cliodna; Cheng, Ying Edith; Leydon, Geraldine; McManus, Richard; Kelly, Joanne; Barnett, Jane; Glasziou, Paul; Mullee, Mark

    2013-10-10

    To determine the effect of clinical scores that predict streptococcal infection or rapid streptococcal antigen detection tests compared with delayed antibiotic prescribing. Open adaptive pragmatic parallel group randomised controlled trial. Primary care in United Kingdom. Patients aged ≥ 3 with acute sore throat. An internet programme randomised patients to targeted antibiotic use according to: delayed antibiotics (the comparator group for analyses), clinical score, or antigen test used according to clinical score. During the trial a preliminary streptococcal score (score 1, n=1129) was replaced by a more consistent score (score 2, n=631; features: fever during previous 24 hours; purulence; attends rapidly (within three days after onset of symptoms); inflamed tonsils; no cough/coryza (acronym FeverPAIN). Symptom severity reported by patients on a 7 point Likert scale (mean severity of sore throat/difficulty swallowing for days two to four after the consultation (primary outcome)), duration of symptoms, use of antibiotics. For score 1 there were no significant differences between groups. For score 2, symptom severity was documented in 80% (168/207 (81%) in delayed antibiotics group; 168/211 (80%) in clinical score group; 166/213 (78%) in antigen test group). Reported severity of symptoms was lower in the clinical score group (-0.33, 95% confidence interval -0.64 to -0.02; P=0.04), equivalent to one in three rating sore throat a slight versus moderate problem, with a similar reduction for the antigen test group (-0.30, -0.61 to -0.00; P=0.05). Symptoms rated moderately bad or worse resolved significantly faster in the clinical score group (hazard ratio 1.30, 95% confidence interval 1.03 to 1.63) but not the antigen test group (1.11, 0.88 to 1.40). In the delayed antibiotics group, 75/164 (46%) used antibiotics. Use of antibiotics in the clinical score group (60/161) was 29% lower (adjusted risk ratio 0.71, 95% confidence interval 0.50 to 0.95; P=0.02) and in the

  1. Test and Score Data Summary for TOEFL[R] Internet-Based and Paper-Based Tests. January 2008-December 2008 Test Data

    Science.gov (United States)

    Educational Testing Service, 2008

    2008-01-01

    The Test of English as a Foreign Language[TM], better known as TOEFL[R], is designed to measure the English-language proficiency of people whose native language is not English. TOEFL scores are accepted by more than 6,000 colleges, universities, and licensing agencies in 130 countries. The test is also used by governments, and scholarship and…

  2. Use of Standardized Test Scores to Predict Success in a Computer Applications Course

    Science.gov (United States)

    Harris, Robert V.; King, Stephanie B.

    2016-01-01

    The purpose of this study was to see if a relationship existed between American College Testing (ACT) scores (i.e., English, reading, mathematics, science reasoning, and composite) and student success in a computer applications course at a Mississippi community college. The study showed that while the ACT scores were excellent predictors of…

  3. Similar predictions of etravirine sensitivity regardless of genotypic testing method used: comparison of available scoring systems.

    Science.gov (United States)

    Vingerhoets, Johan; Nijs, Steven; Tambuyzer, Lotke; Hoogstoel, Annemie; Anderson, David; Picchio, Gaston

    2012-01-01

    The aims of this study were to compare various genotypic scoring systems commonly used to predict virological outcome to etravirine, and examine their concordance with etravirine phenotypic susceptibility. Six etravirine genotypic scoring systems were assessed: Tibotec 2010 (based on 20 mutations; TBT 20), Monogram, Stanford HIVdb, ANRS, Rega (based on 37, 30, 27 and 49 mutations, respectively) and virco(®)TYPE HIV-1 (predicted fold change based on genotype). Samples from treatment-experienced patients who participated in the DUET trials and with both genotypic and phenotypic data (n=403) were assessed using each scoring system. Results were retrospectively correlated with virological response in DUET. κ coefficients were calculated to estimate the degree of correlation between the different scoring systems. Correlation between the five scoring systems and the TBT 20 system was approximately 90%. Virological response by etravirine susceptibility was comparable regardless of which scoring system was utilized, with 70-74% of DUET patients determined as susceptible to etravirine by the different scoring systems achieving plasma viral load <50 HIV-1 RNA copies/ml. In samples classed as phenotypically susceptible to etravirine (fold change in 50% effective concentration ≤3), correlations with genotypic score were consistently high across scoring systems (≥70%). In general, the etravirine genotypic scoring systems produced similar results, and genotype-phenotype concordance was high. As such, phenotypic interpretations, and in their absence all genotypic scoring systems investigated, may be used to reliably predict the activity of etravirine.

  4. A Comparison of the Approaches of Generalizability Theory and Item Response Theory in Estimating the Reliability of Test Scores for Testlet-Composed Tests

    Science.gov (United States)

    Lee, Guemin; Park, In-Yong

    2012-01-01

    Previous assessments of the reliability of test scores for testlet-composed tests have indicated that item-based estimation methods overestimate reliability. This study was designed to address issues related to the extent to which item-based estimation methods overestimate the reliability of test scores composed of testlets and to compare several…

  5. Dichotomous scoring of Trails B in patients referred for a dementia evaluation.

    Science.gov (United States)

    Schmitt, Andrew L; Livingston, Ronald B; Smernoff, Eric N; Waits, Bethany L; Harris, James B; Davis, Kent M

    2010-04-01

    The Trail Making Test is a popular neuropsychological test and its interpretation has traditionally used time-based scores. This study examined an alternative approach to scoring that is simply based on the examinees' ability to complete the test. If an examinee is able to complete Trails B successfully, they are coded as "completers"; if not, they are coded as "noncompleters." To assess this approach to scoring Trails B, the performance of 97 diagnostically heterogeneous individuals referred for a dementia evaluation was examined. In this sample, 55 individuals successfully completed Trails B and 42 individuals were unable to complete it. Point-biserial correlations indicated a moderate-to-strong association (r(pb)=.73) between the Trails B completion variable and the Total Scale score of the Repeatable Battery for the Assessment of Neurological Status (RBANS), which was larger than the correlation between the Trails B time-based score and the RBANS Total Scale score (r(pb)=.60). As a screen for dementia status, Trails B completion showed a sensitivity of 69% and a specificity of 100% in this sample. These results suggest that dichotomous scoring of Trails B might provide a brief and clinically useful measure of dementia status.

  6. Racial Differences in Mathematics Test Scores for Advanced Mathematics Students

    Science.gov (United States)

    Minor, Elizabeth Covay

    2016-01-01

    Research on achievement gaps has found that achievement gaps are larger for students who take advanced mathematics courses compared to students who do not. Focusing on the advanced mathematics student achievement gap, this study found that African American advanced mathematics students have significantly lower test scores and are less likely to be…

  7. Acceptance test procedure for core sample trucks

    International Nuclear Information System (INIS)

    Smalley, J.L.

    1995-01-01

    The purpose of this Acceptance Test Procedure is to provide instruction and documentation for acceptance testing of the rotary mode core sample trucks, HO-68K-4600 and HO-68K-4647. The rotary mode core sample trucks were based upon the design of the second core sample truck (HO-68K-4345) which was constructed to implement rotary mode sampling of the waste tanks at Hanford. Acceptance testing of the rotary mode core sample trucks will verify that the design requirements have been met. All testing will be non-radioactive and stand-in materials shall be used to simulate waste tank conditions. Compressed air will be substituted for nitrogen during the majority of testing, with nitrogen being used only for flow characterization

  8. Adaptive designs for the one-sample log-rank test.

    Science.gov (United States)

    Schmidt, Rene; Faldum, Andreas; Kwiecien, Robert

    2017-09-22

    Traditional designs in phase IIa cancer trials are single-arm designs with a binary outcome, for example, tumor response. In some settings, however, a time-to-event endpoint might appear more appropriate, particularly in the presence of loss to follow-up. Then the one-sample log-rank test might be the method of choice. It allows to compare the survival curve of the patients under treatment to a prespecified reference survival curve. The reference curve usually represents the expected survival under standard of the care. In this work, convergence of the one-sample log-rank statistic to Brownian motion is proven using Rebolledo's martingale central limit theorem while accounting for staggered entry times of the patients. On this basis, a confirmatory adaptive one-sample log-rank test is proposed where provision is made for data dependent sample size reassessment. The focus is to apply the inverse normal method. This is done in two different directions. The first strategy exploits the independent increments property of the one-sample log-rank statistic. The second strategy is based on the patient-wise separation principle. It is shown by simulation that the proposed adaptive test might help to rescue an underpowered trial and at the same time lowers the average sample number (ASN) under the null hypothesis as compared to a single-stage fixed sample design. © 2017, The International Biometric Society.

  9. School accountability and the black-white test score gap.

    Science.gov (United States)

    Gaddis, S Michael; Lauen, Douglas Lee

    2014-03-01

    Since at least the 1960s, researchers have closely examined the respective roles of families, neighborhoods, and schools in producing the black-white achievement gap. Although many researchers minimize the ability of schools to eliminate achievement gaps, the No Child Left Behind Act (NCLB) increased pressure on schools to do so by 2014. In this study, we examine the effects of NCLB's subgroup-specific accountability pressure on changes in black-white math and reading test score gaps using a school-level panel dataset on all North Carolina public elementary and middle schools between 2001 and 2009. Using difference-in-difference models with school fixed effects, we find that accountability pressure reduces black-white achievement gaps by raising mean black achievement without harming mean white achievement. We find no differential effects of accountability pressure based on the racial composition of schools, but schools with more affluent populations are the most successful at reducing the black-white math achievement gap. Thus, our findings suggest that school-based interventions have the potential to close test score gaps, but differences in school composition and resources play a significant role in the ability of schools to reduce racial inequality. Copyright © 2013 Elsevier Inc. All rights reserved.

  10. Risk stratification in middle-aged patients with congestive heart failure: prospective comparison of the Heart Failure Survival Score (HFSS) and a simplified two-variable model.

    Science.gov (United States)

    Zugck, C; Krüger, C; Kell, R; Körber, S; Schellberg, D; Kübler, W; Haass, M

    2001-10-01

    The performance of a US-American scoring system (Heart Failure Survival Score, HFSS) was prospectively evaluated in a sample of ambulatory patients with congestive heart failure (CHF). Additionally, it was investigated whether the HFSS might be simplified by assessment of the distance ambulated during a 6-min walk test (6'WT) instead of determination of peak oxygen uptake (peak VO(2)). In 208 middle-aged CHF patients (age 54+/-10 years, 82% male, NYHA class 2.3+/-0.7; follow-up 28+/-14 months) the seven variables of the HFSS: CHF aetiology; heart rate; mean arterial pressure; serum sodium concentration; intraventricular conduction time; left ventricular ejection fraction (LVEF); and peak VO(2), were determined. Additionally, a 6'WT was performed. The HFSS allowed discrimination between patients at low, medium and high risk, with mortality rates of 16, 39 and 50%, respectively. However, the prognostic power of the HFSS was not superior to a two-variable model consisting only of LVEF and peak VO(2). The areas under the receiver operating curves (AUC) for prediction of 1-year survival were even higher for the two-variable model (0.84 vs. 0.74, P<0.05). Replacing peak VO(2) with 6'WT resulted in a similar AUC (0.83). The HFSS continued to predict survival when applied to this patient sample. However, the HFSS was inferior to a two-variable model containing only LVEF and either peak VO(2) or 6'WT. As the 6'WT requires no sophisticated equipment, a simplified two-variable model containing only LVEF and 6'WT may be more widely applicable, and is therefore recommended.

  11. Optimizing Scoring and Sampling Methods for Assessing Built Neighborhood Environment Quality in Residential Areas

    Directory of Open Access Journals (Sweden)

    Joel Adu-Brimpong

    2017-03-01

    Full Text Available Optimization of existing measurement tools is necessary to explore links between aspects of the neighborhood built environment and health behaviors or outcomes. We evaluate a scoring method for virtual neighborhood audits utilizing the Active Neighborhood Checklist (the Checklist, a neighborhood audit measure, and assess street segment representativeness in low-income neighborhoods. Eighty-two home neighborhoods of Washington, D.C. Cardiovascular Health/Needs Assessment (NCT01927783 participants were audited using Google Street View imagery and the Checklist (five sections with 89 total questions. Twelve street segments per home address were assessed for (1 Land-Use Type; (2 Public Transportation Availability; (3 Street Characteristics; (4 Environment Quality and (5 Sidewalks/Walking/Biking features. Checklist items were scored 0–2 points/question. A combinations algorithm was developed to assess street segments’ representativeness. Spearman correlations were calculated between built environment quality scores and Walk Score®, a validated neighborhood walkability measure. Street segment quality scores ranged 10–47 (Mean = 29.4 ± 6.9 and overall neighborhood quality scores, 172–475 (Mean = 352.3 ± 63.6. Walk scores® ranged 0–91 (Mean = 46.7 ± 26.3. Street segment combinations’ correlation coefficients ranged 0.75–1.0. Significant positive correlations were found between overall neighborhood quality scores, four of the five Checklist subsection scores, and Walk Scores® (r = 0.62, p < 0.001. This scoring method adequately captures neighborhood features in low-income, residential areas and may aid in delineating impact of specific built environment features on health behaviors and outcomes.

  12. Source Country Differences in Test Score Gaps: Evidence from Denmark

    Science.gov (United States)

    Rangvid, Beatrice Schindler

    2010-01-01

    We combine data from three studies for Denmark in the PISA 2000 framework to investigate differences in the native-immigrant test score gap by country of origin. In addition to the controls available from PISA data sources, we use student-level data on home background and individual migration histories linked from administrative registers. We find…

  13. Low default credit scoring using two-class non-parametric kernel density estimation

    CSIR Research Space (South Africa)

    Rademeyer, E

    2016-12-01

    Full Text Available This paper investigates the performance of two-class classification credit scoring data sets with low default ratios. The standard two-class parametric Gaussian and non-parametric Parzen classifiers are extended, using Bayes’ rule, to include either...

  14. A comparative study on assessment procedures and metric properties of two scoring systems of the Coma Recovery Scale-Revised items: standard and modified scores.

    Science.gov (United States)

    Sattin, Davide; Lovaglio, Piergiorgio; Brenna, Greta; Covelli, Venusia; Rossi Sebastiano, Davide; Duran, Dunja; Minati, Ludovico; Giovannetti, Ambra Mara; Rosazza, Cristina; Bersano, Anna; Nigri, Anna; Ferraro, Stefania; Leonardi, Matilde

    2017-09-01

    The study compared the metric characteristics (discriminant capacity and factorial structure) of two different methods for scoring the items of the Coma Recovery Scale-Revised and it analysed scale scores collected using the standard assessment procedure and a new proposed method. Cross sectional design/methodological study. Inpatient, neurological unit. A total of 153 patients with disorders of consciousness were consecutively enrolled between 2011 and 2013. All patients were assessed with the Coma Recovery Scale-Revised using standard (rater 1) and inverted (rater 2) procedures. Coma Recovery Scale-Revised score, number of cognitive and reflex behaviours and diagnosis. Regarding patient assessment, rater 1 using standard and rater 2 using inverted procedures obtained the same best scores for each subscale of the Coma Recovery Scale-Revised for all patients, so no clinical (and statistical) difference was found between the two procedures. In 11 patients (7.7%), rater 2 noted that some Coma Recovery Scale-Revised codified behavioural responses were not found during assessment, although higher response categories were present. A total of 51 (36%) patients presented the same Coma Recovery Scale-Revised scores of 7 or 8 using a standard score, whereas no overlap was found using the modified score. Unidimensionality was confirmed for both score systems. The Coma Recovery Scale Modified Score showed a higher discriminant capacity than the standard score and a monofactorial structure was also supported. The inverted assessment procedure could be a useful evaluation method for the assessment of patients with disorder of consciousness diagnosis.

  15. Effects of correcting for prematurity on cognitive test scores in childhood.

    Science.gov (United States)

    Wilson-Ching, Michelle; Pascoe, Leona; Doyle, Lex W; Anderson, Peter J

    2014-03-01

    The American Academy of Pediatrics recommends that test scores should be corrected for prematurity up to 3 years of age, but this practice varies greatly in both clinical and research settings. The aim of this study was to contrast the effects of using chronological age and those of using corrected age on measures of cognitive outcome across childhood. A theoretical model was constructed using norms from the Bayley Scales of Infant and Toddler Development, Third Edition; the Wechsler Preschool and Primary Scale of Intelligence, Third Edition Australian; and the Wechsler Intelligence Scales for Children, Fourth Edition Australian. Baseline scores representing different levels of functioning (70, below average; 85, borderline; and 100, average) were recalculated using the normative data for ages 6 months to 16 years to account for 1, 2, 3 and 4 months of prematurity. The model created depicted the difference in standardised scores between chronological and corrected age. Compared with scores corrected for prematurity, the absolute reduction in scores using chronological age was greater for increasing degree of prematurity, younger ages at assessment and higher baseline scores and was substantial even beyond 3 years of age. However, the pattern was erratic, with considerable fluctuation evident across different ages and baseline scores. Chronological age results in a lowering of scores at all ages for preterm-born subjects that is greater in the first few years and in those born at earlier gestational ages. Whether or not to correct for prematurity depends upon the context of the assessment. © 2014 The Authors. Journal of Paediatrics and Child Health © 2014 Paediatrics and Child Health Division (Royal Australasian College of Physicians).

  16. Predicting Freshman Grade Point Average From College Admissions Test Scores and State High School Test Scores

    Directory of Open Access Journals (Sweden)

    Daniel Koretz

    2016-09-01

    Full Text Available The current focus on assessing “college and career readiness” raises an empirical question: How do high school tests compare with college admissions tests in predicting performance in college? We explored this using data from the City University of New York and public colleges in Kentucky. These two systems differ in the choice of college admissions test, the stakes for students on the high school test, and demographics. We predicted freshman grade point average (FGPA from high school GPA and both college admissions and high school tests in mathematics and English. In both systems, the choice of tests had only trivial effects on the aggregate prediction of FGPA. Adding either test to an equation that included the other had only trivial effects on prediction. Although the findings suggest that the choice of test might advantage or disadvantage different students, it had no substantial effect on the over- and underprediction of FGPA for students classified by race-ethnicity or poverty.

  17. The Effects of Group Members' Personalities on a Test Taker's L2 Group Oral Discussion Test Scores

    Science.gov (United States)

    Ockey, Gary J.

    2009-01-01

    The second language group oral is a test of second language speaking proficiency, in which a group of three or more English language learners discuss an assigned topic without interaction with interlocutors. Concerns expressed about the extent to which test takers' personal characteristics affect the scores of others in the group have limited its…

  18. Standardised test protocol (Constant Score) for evaluation of functionality in patients with shoulder disorders

    DEFF Research Database (Denmark)

    Ban, Ilija; Troelsen, Anders; Christiansen, David Høyrup

    2013-01-01

    INTRODUCTION: The Constant Score (CS), developed as a scoring system to evaluate overall functionality of patients with shoulder disorders, is widely used but has been criticised for relying on an imprecise terminology and for lack of a standardised methodology. A modified guideline was therefore...... differences. One of the authors of the modified CS approved both the English and the Danish test protocol. CONCLUSION: A simple test protocol of the modified CS was developed in both English and Danish. With precise terminology and definitions, the test protocol is the first of its kind. We suggest its use...

  19. Psychometric Quality of the Dutch Version of the Children's Eating Attitude Test in a Community Sample and a Sample of Overweight Youngsters

    Directory of Open Access Journals (Sweden)

    Lotte Theuwis

    2010-12-01

    Full Text Available Introduction. Disturbed eating attitudes may be important precursors of pathological eating patterns and, therefore need to be researched adequately. The Children's Eating Attitude Test (ChEAT is indicated for detecting at-risk attitudes and concerns in youngsters. Method. The present study was designed to provide a preliminary psychometric evaluation of the Dutch version of the ChEAT, by examining reliability and validity in a sample of 166 youngsters. Results. Generally the ChEAT seems to be a reliable instrument. Concurrent validity was demonstrated by positive correlations with measures assessing pathological eating behaviour and with related psychological problems. The discriminant validity was good. Based on ChEAT scores we can distinguish overweight youngsters from the community sample and “dieters” from “non dieters”. Divergent validity and factor structure show still shortcomings. Discussion. The Dutch version of the ChEAT seems to be a promising screening- and research instrument. Future prospective research could focus on a cut-off score for identifying at-risk youngsters.

  20. Opportunity to learn: Investigating possible predictors for pre-course Test Of Astronomy STandards TOAST scores

    Science.gov (United States)

    Berryhill, Katie J.

    As astronomy education researchers become more interested in experimentally testing innovative teaching strategies to enhance learning in introductory astronomy survey courses ("ASTRO 101"), scholars are placing increased attention toward better understanding factors impacting student gain scores on the widely used Test Of Astronomy STandards (TOAST). Usually used in a pre-test and post-test research design, one might naturally assume that the pre-course differences observed between high- and low-scoring college students might be due in large part to their pre-existing motivation, interest, experience in science, and attitudes about astronomy. To explore this notion, 11 non-science majoring undergraduates taking ASTRO 101 at west coast community colleges were interviewed in the first few weeks of the course to better understand students' pre-existing affect toward learning astronomy with an eye toward predicting student success. In answering this question, we hope to contribute to our understanding of the incoming knowledge of students taking undergraduate introductory astronomy classes, but also gain insight into how faculty can best meet those students' needs and assist them in achieving success. Perhaps surprisingly, there was only weak correlation between students' motivation toward learning astronomy and their pre-test scores. Instead, the most fruitful predictor of TOAST pre-test scores was the quantity of pre-existing, informal, self-directed astronomy learning experiences.

  1. Tests on standard concrete samples

    CERN Multimedia

    CERN PhotoLab

    1973-01-01

    Compression and tensile tests on standard concrete samples. The use of centrifugal force in tensile testing has been developed by the SB Division and the instruments were built in the Central workshops.

  2. The Alzheimer's prevention initiative composite cognitive test score: sample size estimates for the evaluation of preclinical Alzheimer's disease treatments in presenilin 1 E280A mutation carriers.

    Science.gov (United States)

    Ayutyanont, Napatkamon; Langbaum, Jessica B S; Hendrix, Suzanne B; Chen, Kewei; Fleisher, Adam S; Friesenhahn, Michel; Ward, Michael; Aguirre, Camilo; Acosta-Baena, Natalia; Madrigal, Lucìa; Muñoz, Claudia; Tirado, Victoria; Moreno, Sonia; Tariot, Pierre N; Lopera, Francisco; Reiman, Eric M

    2014-06-01

    To identify a cognitive composite that is sensitive to tracking preclinical Alzheimer's disease decline to be used as a primary end point in treatment trials. We capitalized on longitudinal data collected from 1995 to 2010 from cognitively unimpaired presenilin 1 (PSEN1) E280A mutation carriers from the world's largest known early-onset autosomal dominant Alzheimer's disease kindred to identify a composite cognitive test with the greatest statistical power to track preclinical Alzheimer's disease decline and estimate the number of carriers age 30 years and older needed to detect a treatment effect in the Alzheimer's Prevention Initiative's (API) preclinical Alzheimer's disease treatment trial. The mean-to-standard-deviation ratios (MSDRs) of change over time were calculated in a search for the optimal combination of 1 to 7 cognitive tests/subtests drawn from the neuropsychological test battery in cognitively unimpaired mutation carriers during a 2- and 5-year follow-up period (n = 78 and 57), using data from noncarriers (n = 31 and 56) during the same time period to correct for aging and practice effects. Combinations that performed well were then evaluated for robustness across follow-up years, occurrence of selected items within top-performing combinations, and representation of relevant cognitive domains. The optimal test combination included Consortium to Establish a Registry for Alzheimer's Disease (CERAD) Word List Recall, CERAD Boston Naming Test (high frequency items), Mini-Mental State Examination (MMSE) Orientation to Time, CERAD Constructional Praxis, and Raven's Progressive Matrices (Set A), with an MSDR of 1.62. This composite is more sensitive than using either the CERAD Word List Recall (MSDR = 0.38) or the entire CERAD-Col battery (MSDR = 0.76). A sample size of 75 cognitively normal PSEN1 E280A mutation carriers aged 30 years and older per treatment arm allows for a detectable treatment effect of 29% in a 60-month trial (80% power, P = .05). We

  3. On Using a Pilot Sample Variance for Sample Size Determination in the Detection of Differences between Two Means: Power Consideration

    Science.gov (United States)

    Shieh, Gwowen

    2013-01-01

    The a priori determination of a proper sample size necessary to achieve some specified power is an important problem encountered frequently in practical studies. To establish the needed sample size for a two-sample "t" test, researchers may conduct the power analysis by specifying scientifically important values as the underlying population means…

  4. Construction of an Exome-Wide Risk Score for Schizophrenia Based on a Weighted Burden Test.

    Science.gov (United States)

    Curtis, David

    2018-01-01

    Polygenic risk scores obtained as a weighted sum of associated variants can be used to explore association in additional data sets and to assign risk scores to individuals. The methods used to derive polygenic risk scores from common SNPs are not suitable for variants detected in whole exome sequencing studies. Rare variants, which may have major effects, are seen too infrequently to judge whether they are associated and may not be shared between training and test subjects. A method is proposed whereby variants are weighted according to their frequency, their annotations and the genes they affect. A weighted sum across all variants provides an individual risk score. Scores constructed in this way are used in a weighted burden test and are shown to be significantly different between schizophrenia cases and controls using a five-way cross-validation procedure. This approach represents a first attempt to summarise exome sequence variation into a summary risk score, which could be combined with risk scores from common variants and from environmental factors. It is hoped that the method could be developed further. © 2017 John Wiley & Sons Ltd/University College London.

  5. Pediatric residents' learning styles and temperaments and their relationships to standardized test scores.

    Science.gov (United States)

    Tuli, Sanjeev Y; Thompson, Lindsay A; Saliba, Heidi; Black, Erik W; Ryan, Kathleen A; Kelly, Maria N; Novak, Maureen; Mellott, Jane; Tuli, Sonal S

    2011-12-01

    Board certification is an important professional qualification and a prerequisite for credentialing, and the Accreditation Council for Graduate Medical Education (ACGME) assesses board certification rates as a component of residency program effectiveness. To date, research has shown that preresidency measures, including National Board of Medical Examiners scores, Alpha Omega Alpha Honor Medical Society membership, or medical school grades poorly predict postresidency board examination scores. However, learning styles and temperament have been identified as factors that 5 affect test-taking performance. The purpose of this study is to characterize the learning styles and temperaments of pediatric residents and to evaluate their relationships to yearly in-service and postresidency board examination scores. This cross-sectional study analyzed the learning styles and temperaments of current and past pediatric residents by administration of 3 validated tools: the Kolb Learning Style Inventory, the Keirsey Temperament Sorter, and the Felder-Silverman Learning Style test. These results were compared with known, normative, general and medical population data and evaluated for correlation to in-service examination and postresidency board examination scores. The predominant learning style for pediatric residents was converging 44% (33 of 75 residents) and the predominant temperament was guardian 61% (34 of 56 residents). The learning style and temperament distribution of the residents was significantly different from published population data (P  =  .002 and .04, respectively). Learning styles, with one exception, were found to be unrelated to standardized test scores. The predominant learning style and temperament of pediatric residents is significantly different than that of the populations of general and medical trainees. However, learning styles and temperament do not predict outcomes on standardized in-service and board examinations in pediatric residents.

  6. Sample test cases using the environmental computer code NECTAR

    International Nuclear Information System (INIS)

    Ponting, A.C.

    1984-06-01

    This note demonstrates a few of the many different ways in which the environmental computer code NECTAR may be used. Four sample test cases are presented and described to show how NECTAR input data are structured. Edited output is also presented to illustrate the format of the results. Two test cases demonstrate how NECTAR may be used to study radio-isotopes not explicitly included in the code. (U.K.)

  7. A Summary Score for the Framingham Heart Study Neuropsychological Battery.

    Science.gov (United States)

    Downer, Brian; Fardo, David W; Schmitt, Frederick A

    2015-10-01

    To calculate three summary scores of the Framingham Heart Study neuropsychological battery and determine which score best differentiates between subjects classified as having normal cognition, test-based impaired learning and memory, test-based multidomain impairment, and dementia. The final sample included 2,503 participants. Three summary scores were assessed: (a) composite score that provided equal weight to each subtest, (b) composite score that provided equal weight to each cognitive domain assessed by the neuropsychological battery, and (c) abbreviated score comprised of subtests for learning and memory. Receiver operating characteristic analysis was used to determine which summary score best differentiated between the four cognitive states. The summary score that provided equal weight to each subtest best differentiated between the four cognitive states. A summary score that provides equal weight to each subtest is an efficient way to utilize all of the cognitive data collected by a neuropsychological battery. © The Author(s) 2015.

  8. Genome-Wide Polygenic Scores Predict Reading Performance Throughout the School Years.

    Science.gov (United States)

    Selzam, Saskia; Dale, Philip S; Wagner, Richard K; DeFries, John C; Cederlöf, Martin; O'Reilly, Paul F; Krapohl, Eva; Plomin, Robert

    2017-07-04

    It is now possible to create individual-specific genetic scores, called genome-wide polygenic scores (GPS). We used a GPS for years of education ( EduYears ) to predict reading performance assessed at UK National Curriculum Key Stages 1 (age 7), 2 (age 12) and 3 (age 14) and on reading tests administered at ages 7 and 12 in a UK sample of 5,825 unrelated individuals. EduYears GPS accounts for up to 5% of the variance in reading performance at age 14. GPS predictions remained significant after accounting for general cognitive ability and family socioeconomic status. Reading performance of children in the lowest and highest 12.5% of the EduYears GPS distribution differed by a mean growth in reading ability of approximately two school years. It seems certain that polygenic scores will be used to predict strengths and weaknesses in education.

  9. Are WISC IQ scores in children with mathematical learning disabilities underestimated? The influence of a specialized intervention on test performance.

    Science.gov (United States)

    Lambert, Katharina; Spinath, Birgit

    2018-01-01

    Intelligence measures play a pivotal role in the diagnosis of mathematical learning disabilities (MLD). Probably as a result of math-related material in IQ tests, children with MLD often display reduced IQ scores. However, it remains unclear whether the effects of math remediation extend to IQ scores. The present study investigated the impact of a special remediation program compared to a control group receiving private tutoring (PT) on the WISC IQ scores of children with MLD. We included N=45 MLD children (7-12 years) in a study with a pre- and post-test control group design. Children received remediation for two years on average. The analyses revealed significantly greater improvements in the experimental group on the Full-Scale IQ, and the Verbal Comprehension, Perceptual Reasoning, and Working Memory indices, but not Processing Speed, compared to the PT group. Children in the experimental group showed an average WISC IQ gain of more than ten points. Results indicate that the WISC IQ scores of MLD children might be underestimated and that an effective math intervention can improve WISC IQ test performance. Taking limitations into account, we discuss the use of IQ measures more generally for defining MLD in research and practice. Copyright © 2017 Elsevier Ltd. All rights reserved.

  10. Experimental and Sampling Design for the INL-2 Sample Collection Operational Test

    Energy Technology Data Exchange (ETDEWEB)

    Piepel, Gregory F.; Amidan, Brett G.; Matzke, Brett D.

    2009-02-16

    This report describes the experimental and sampling design developed to assess sampling approaches and methods for detecting contamination in a building and clearing the building for use after decontamination. An Idaho National Laboratory (INL) building will be contaminated with BG (Bacillus globigii, renamed Bacillus atrophaeus), a simulant for Bacillus anthracis (BA). The contamination, sampling, decontamination, and re-sampling will occur per the experimental and sampling design. This INL-2 Sample Collection Operational Test is being planned by the Validated Sampling Plan Working Group (VSPWG). The primary objectives are: 1) Evaluate judgmental and probabilistic sampling for characterization as well as probabilistic and combined (judgment and probabilistic) sampling approaches for clearance, 2) Conduct these evaluations for gradient contamination (from low or moderate down to absent or undetectable) for different initial concentrations of the contaminant, 3) Explore judgment composite sampling approaches to reduce sample numbers, 4) Collect baseline data to serve as an indication of the actual levels of contamination in the tests. A combined judgmental and random (CJR) approach uses Bayesian methodology to combine judgmental and probabilistic samples to make clearance statements of the form "X% confidence that at least Y% of an area does not contain detectable contamination” (X%/Y% clearance statements). The INL-2 experimental design has five test events, which 1) vary the floor of the INL building on which the contaminant will be released, 2) provide for varying the amount of contaminant released to obtain desired concentration gradients, and 3) investigate overt as well as covert release of contaminants. Desirable contaminant gradients would have moderate to low concentrations of contaminant in rooms near the release point, with concentrations down to zero in other rooms. Such gradients would provide a range of contamination levels to challenge the sampling

  11. The predictive value of an adjusted COPD assessment test score on the risk of respiratory-related hospitalizations in severe COPD patients.

    Science.gov (United States)

    Barton, Christopher A; Bassett, Katherine L; Buckman, Julie; Effing, Tanja W; Frith, Peter A; van der Palen, Job; Sloots, Joanne M

    2017-02-01

    We evaluated whether a chronic obstructive pulmonary disease (COPD) assessment test (CAT) with adjusted weights for the CAT items could better predict future respiratory-related hospitalizations than the original CAT. Two focus groups (respiratory nurses and physicians) generated two adjusted CAT algorithms. Two multivariate logistic regression models for infrequent (≤1/year) versus frequent (>1/year) future respiratory-related hospitalizations were defined: one with the adjusted CAT score that correlated best with future hospitalizations and one with the original CAT score. Patient characteristics related to future hospitalizations ( p ≤ 0.2) were also entered. Eighty-two COPD patients were included. The CAT algorithm derived from the nurse focus group was a borderline significant predictor of hospitalization risk (odds ratio (OR): 1.07; 95% confidence interval (CI): 1.00-1.14; p = 0.050) in a model that also included hospitalization frequency in the previous year (OR: 3.98; 95% CI: 1.30-12.16; p = 0.016) and anticholinergic risk score (OR: 3.08; 95% CI: 0.87-10.89; p = 0.081). Presence of ischemic heart disease and/or heart failure appeared 'protective' (OR: 0.17; 95% CI: 0.05-0.62; p = 0.007). The original CAT score was not significantly associated with hospitalization risk. In conclusion, as a predictor of respiratory-related hospitalizations, an adjusted CAT score was marginally significant (although the original CAT score was not). 'Previous respiratory-related hospitalizations' was the strongest factor in this equation.

  12. ¿Exito en California? A Validity Critique of Language Program Evaluations and Analysis of English Learner Test Scores

    Directory of Open Access Journals (Sweden)

    Marilyn S. Thompson

    2002-01-01

    Full Text Available Several states have recently faced ballot initiatives that propose to functionally eliminate bilingual education in favor of English-only approaches. Proponents of these initiatives have argued an overall rise in standardized achievement scores of California's limited English proficient (LEP students is largely due to the implementation of English immersion programs mandated by Proposition 227 in 1998, hence, they claim Exito en California (Success in California. However, many such arguments presented in the media were based on flawed summaries of these data. We first discuss the background, media coverage, and previous research associated with California's Proposition 227. We then present a series of validity concerns regarding use of Stanford-9 achievement data to address policy for educating LEP students; these concerns include the language of the test, alternative explanations, sample selection, and data analysis decisions. Finally, we present a comprehensive summary of scaled-score achievement means and trajectories for California's LEP and non-LEP students for 1998-2000. Our analyses indicate that although scores have risen overall, the achievement gap between LEP and EP students does not appear to be narrowing.

  13. Development of an objective gene expression panel as an alternative to self-reported symptom scores in human influenza challenge trials.

    Science.gov (United States)

    Muller, Julius; Parizotto, Eneida; Antrobus, Richard; Francis, James; Bunce, Campbell; Stranks, Amanda; Nichols, Marshall; McClain, Micah; Hill, Adrian V S; Ramasamy, Adaikalavan; Gilbert, Sarah C

    2017-06-08

    Influenza challenge trials are important for vaccine efficacy testing. Currently, disease severity is determined by self-reported scores to a list of symptoms which can be highly subjective. A more objective measure would allow for improved data analysis. Twenty-one volunteers participated in an influenza challenge trial. We calculated the daily sum of scores (DSS) for a list of 16 influenza symptoms. Whole blood collected at baseline and 24, 48, 72 and 96 h post challenge was profiled on Illumina HT12v4 microarrays. Changes in gene expression most strongly correlated with DSS were selected to train a Random Forest model and tested on two independent test sets consisting of 41 individuals profiled on a different microarray platform and 33 volunteers assayed by qRT-PCR. 1456 probes are significantly associated with DSS at 1% false discovery rate. We selected 19 genes with the largest fold change to train a random forest model. We observed good concordance between predicted and actual scores in the first test set (r = 0.57; RMSE = -16.1%) with the greatest agreement achieved on samples collected approximately 72 h post challenge. Therefore, we assayed samples collected at baseline and 72 h post challenge in the second test set by qRT-PCR and observed good concordance (r = 0.81; RMSE = -36.1%). We developed a 19-gene qRT-PCR panel to predict DSS, validated on two independent datasets. A transcriptomics based panel could provide a more objective measure of symptom scoring in future influenza challenge studies. Trial registration Samples were obtained from a clinical trial with the ClinicalTrials.gov Identifier: NCT02014870, first registered on December 5, 2013.

  14. ACER Mathematics Profile Series: Number Test. (Test Booklet, Answer and Record Sheet, Score Key, and Teachers Handbook).

    Science.gov (United States)

    Cornish, Greg; Wines, Robin

    The Number Test of the ACER Mathematics Profile Series, contains 30 items, for each of three suggested grade levels: 7-8, 8-9, and 9-10. Raw scores on all tests in the ACER Mathematics Profile Series (Number, Operations, Space and Measurement) are converted to a common scale called MAPS, a major feature of the Series. Based on the Rasch Model,…

  15. Social orientation, sexual role, and moral judgment: a comparison of two brazilian and one norwegian sample / Orientação social, papel sexual e julgamento moral: uma comparação entre duas amostras brasileiras e uma norueguesa

    Directory of Open Access Journals (Sweden)

    Angela Biaggio

    2005-01-01

    Full Text Available Thirty female and 30 male university students each from Joao Pessoa and Porto Alegre were compared to a comparable Norwegian sample of 60 female and 60 male students. Except for a suggestion of differences in women's cultural orientation, comparisons on Gibbs' test of justice morality, the ECI test for ethic of care, Bem's sex role inventory, and Triandis' test for cultural orientations showed that all differences were between the Norwegian sample and the Brazilian samples as a unit. Brazilians showed a differentiation of sex roles, which was not shown in Norwegians, and higher scores on the collectivism cultural orientation. Norwegians showed higher scores ECI, which might be because of a culture bias in the test. No difference was shown for individualism cultural orientation, and on Gibbs' test. Men scored higher on the total individualism measure, and women on vertical collectivism. JP women scored as more hedonistic and individual than the PA women, who scores as more traditional than the JP women.

  16. Polytrauma Defined by the New Berlin Definition: A Validation Test Based on Propensity-Score Matching Approach.

    Science.gov (United States)

    Rau, Cheng-Shyuan; Wu, Shao-Chun; Kuo, Pao-Jen; Chen, Yi-Chun; Chien, Peng-Chen; Hsieh, Hsiao-Yun; Hsieh, Ching-Hua

    2017-09-11

    Background: Polytrauma patients are expected to have a higher risk of mortality than that obtained by the summation of expected mortality owing to their individual injuries. This study was designed to investigate the outcome of patients with polytrauma, which was defined using the new Berlin definition, as cases with an Abbreviated Injury Scale (AIS) ≥ 3 for two or more different body regions and one or more additional variables from five physiologic parameters (hypotension [systolic blood pressure ≤ 90 mmHg], unconsciousness [Glasgow Coma Scale score ≤ 8], acidosis [base excess ≤ -6.0], coagulopathy [partial thromboplastin time ≥ 40 s or international normalized ratio ≥ 1.4], and age [≥70 years]). Methods: We retrieved detailed data on 369 polytrauma patients and 1260 non-polytrauma patients with an overall Injury Severity Score (ISS) ≥ 18 who were hospitalized between 1 January 2009 and 31 December 2015 for the treatment of all traumatic injuries, from the Trauma Registry System at a level I trauma center. Patients with burn injury or incomplete registered data were excluded. Categorical data were compared with two-sided Fisher exact or Pearson chi-square tests. The unpaired Student t -test and the Mann-Whitney U -test was used to analyze normally distributed continuous data and non-normally distributed data, respectively. Propensity-score matched cohort in a 1:1 ratio was allocated using the NCSS software with logistic regression to evaluate the effect of polytrauma on patient outcomes. Results: The polytrauma patients had a significantly higher ISS than non-polytrauma patients (median (interquartile range Q1-Q3), 29 (22-36) vs. 24 (20-25), respectively; p Polytrauma patients had a 1.9-fold higher odds of mortality than non-polytrauma patients (95% CI 1.38-2.49; p polytrauma patients, polytrauma patients had a substantially longer hospital length of stay (LOS). In addition, a higher proportion of polytrauma patients were admitted to the intensive

  17. Predisposing factors of pneumothorax in percutaneous transthoracic fine needle aspiration biopsy: comparison between CT emphysema score and pulmonary function test

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Chang Ho; Park, Kyung Joo; Park, Dong Won; Jung, Kyung Il; Suh, Jung Ho [Ajou Univ. College of Medicine, Seoul (Korea, Republic of)

    1997-11-01

    To compare the CT emphysema score with various factors of pulmonary function test by simple spirometry and to use the result as a predictor of pneumothorax in percutaneous transthoracic fine needle aspiration biopsy. The CT scans of 106 patients who had undergone percutaneous transthoracic fine needle aspiration biopsy of lung lesions within the previous 18 months were retrospectively reviewed. In 75 of these 106 cases, the results of the pulmonary function test were also reviewed. On plain chest radiography, pneumothorax was noted in 20 cases (19%). Emphysema was blindly evaluated. We divided each lung into four segments and determined the severity and involved volume of emphysema, as seen on CT. Severity was classified as one of four grades, as follow : absence of emphysema=0 ; low attenuation area of less than 5mm=1 ; low attenuation area of more than 5mm, and vascular pruning with normal lung intervening=2 ; and diffuse low attenuation without intervening normal lung, and larger confluent low attenuation with vascular pruning and distortion of branching pattern occupying all or almost all the involved parenchyma=3. The involved area was also classified as one of four grades : less than 25%=1 ; 25 - 49%=2 ; 51 - 74%=3 ; and more than 75%=4. The CT emphysema score was defined as the average of the grade of severity multiplied by the grade of involved area. Pulmonary function tests, consisting of simple spirometry and a pulmonologist's interpretation, were evaluated. We also evaluated depth and size of lesion as known predisposing factors in postbioptic pneumothorax. Statistical analysis was performed using the chi-square test, Wilcoxon ranks sum W test and the student t test. A comparison between the two groups of occurrence(with or without pneumothorax) showed the emphysema scores to be 1.69{+-}2.0 and 1.11{+-}2.9, respectively ; there was thus no significant difference between the two groups (z= - 0.048, p>0.10). Nor were differences revealed by the

  18. Predisposing factors of pneumothorax in percutaneous transthoracic fine needle aspiration biopsy: comparison between CT emphysema score and pulmonary function test

    International Nuclear Information System (INIS)

    Lee, Chang Ho; Park, Kyung Joo; Park, Dong Won; Jung, Kyung Il; Suh, Jung Ho

    1997-01-01

    To compare the CT emphysema score with various factors of pulmonary function test by simple spirometry and to use the result as a predictor of pneumothorax in percutaneous transthoracic fine needle aspiration biopsy. The CT scans of 106 patients who had undergone percutaneous transthoracic fine needle aspiration biopsy of lung lesions within the previous 18 months were retrospectively reviewed. In 75 of these 106 cases, the results of the pulmonary function test were also reviewed. On plain chest radiography, pneumothorax was noted in 20 cases (19%). Emphysema was blindly evaluated. We divided each lung into four segments and determined the severity and involved volume of emphysema, as seen on CT. Severity was classified as one of four grades, as follow : absence of emphysema=0 ; low attenuation area of less than 5mm=1 ; low attenuation area of more than 5mm, and vascular pruning with normal lung intervening=2 ; and diffuse low attenuation without intervening normal lung, and larger confluent low attenuation with vascular pruning and distortion of branching pattern occupying all or almost all the involved parenchyma=3. The involved area was also classified as one of four grades : less than 25%=1 ; 25 - 49%=2 ; 51 - 74%=3 ; and more than 75%=4. The CT emphysema score was defined as the average of the grade of severity multiplied by the grade of involved area. Pulmonary function tests, consisting of simple spirometry and a pulmonologist's interpretation, were evaluated. We also evaluated depth and size of lesion as known predisposing factors in postbioptic pneumothorax. Statistical analysis was performed using the chi-square test, Wilcoxon ranks sum W test and the student t test. A comparison between the two groups of occurrence(with or without pneumothorax) showed the emphysema scores to be 1.69±2.0 and 1.11±2.9, respectively ; there was thus no significant difference between the two groups (z= - 0.048, p>0.10). Nor were differences revealed by the pulmonary

  19. Test equality between two binary screening tests with a confirmatory procedure restricted on screen positives.

    Science.gov (United States)

    Lui, Kung-Jong; Chang, Kuang-Chao

    2015-01-01

    In studies of screening accuracy, we may commonly encounter the data in which a confirmatory procedure is administered to only those subjects with screen positives for ethical concerns. We focus our discussion on simultaneously testing equality of sensitivity and specificity between two binary screening tests when only subjects with screen positives receive the confirmatory procedure. We develop four asymptotic test procedures and one exact test procedure. We derive sample size calculation formula for a desired power of detecting a difference at a given nominal [Formula: see text]-level. We employ Monte Carlo simulation to evaluate the performance of these test procedures and the accuracy of the sample size calculation formula developed here in a variety of situations. Finally, we use the data obtained from a study of the prostate-specific-antigen test and digital rectal examination test on 949 Black men to illustrate the practical use of these test procedures and the sample size calculation formula.

  20. Adolescent Psychopathy and the Big Five: Results from Two Samples

    Science.gov (United States)

    Lynam, Donald R.; Caspi, Avshalom; Moffitt, Terrie E.; Raine, Adrian; Loeber, Rolf; Stouthamer-Loeber, Magda

    2005-01-01

    The present study examines the relation between psychopathy and the Big Five dimensions of personality in two samples of adolescents. Specifically, the study tests the hypothesis that the aspect of psychopathy representing selfishness, callousness, and interpersonal manipulation (Factor 1) is most strongly associated with low Agreeableness,…

  1. Mini mental Parkinson test: standardization and normative data on an Italian sample.

    Science.gov (United States)

    Costa, Alberto; Bagoj, Eriola; Monaco, Marco; Zabberoni, Silvia; De Rosa, Salvatore; Mundi, Ciro; Caltagirone, Carlo; Carlesimo, Giovanni Augusto

    2013-10-01

    The mini mental Parkinson (MMP) is a test built to overcome the limits of the mini mental state examination (MMSE) in the short-time screening of cognitive disorders in individuals with Parkinson's disease (PD). In fact, in this scale, items tapping executive functioning are included to better capture PD-related cognitive changes. Some data sustain the sensitivity and validity of the MMP in the short neuropsychological screening of these individuals. Here, we report normative data on the MMP we collected on a sample of 307 Italian healthy subjects ranging from 40 to 91 years. The results document a detrimental effect of age and an ameliorative effect of education on the MMP total performance score. We provide for correction grids for age and literacy that derive from results of the regression analyses. Moreover, we also computed equivalent scores in order to allow a direct and fast comparison between the performance on the MMP and on other psychometric measures that can be administered to the subjects.

  2. TEST-RETEST RELIABILITY OF THE CLOSED KINETIC CHAIN UPPER EXTREMITY STABILITY TEST (CKCUEST) IN ADOLESCENTS: RELIABILITY OF CKCUEST IN ADOLESCENTS.

    Science.gov (United States)

    de Oliveira, Valéria M A; Pitangui, Ana C R; Nascimento, Vinícius Y S; da Silva, Hítalo A; Dos Passos, Muana H P; de Araújo, Rodrigo C

    2017-02-01

    The Closed Kinetic Chain Upper Extremity Stability Test (CKCUEST) has been proposed as an option to assess upper limb function and stability; however, there are few studies that support the use of this test in adolescents. The purpose of the present study was to investigate the intersession reliability and agreement of three CKCUEST scores in adolescents and establish clinimetric values for this test. Test-retest reliability. Twenty-five healthy adolescents of both sexes were evaluated. The subjects performed two CKCUEST with an interval of one week between the tests. An intraclass correlation coefficient (ICC 3,3 ) two-way mixed model with a 95% interval of confidence was utilized to determine intersession reliability. A Bland-Altman graph was plotted to analyze the agreement between assessments. The presence of systematic error was evaluated by a one-sample t test. The difference between the evaluation and reevaluation was observed using a paired-sample t test. The level of significance was set at 0.05. Standard error of measurements and minimum detectable changes were calculated. The intersession reliability of the average touches score, normalized score, and power score were 0.68, 0.68 and 0.87, the standard error of measurement were 2.17, 1.35 and 6.49, and the minimal detectable change was 6.01, 3.74 and 17.98, respectively. The presence of systematic error (p test with moderate to excellent reliability when used with adolescents. The CKCUEST is a measurement with moderate to excellent reliability for adolescents. 2b.

  3. Linear-rank testing of a non-binary, responder-analysis, efficacy score to evaluate pharmacotherapies for substance use disorders.

    Science.gov (United States)

    Holmes, Tyson H; Li, Shou-Hua; McCann, David J

    2016-11-23

    The design of pharmacological trials for management of substance use disorders is shifting toward outcomes of successful individual-level behavior (abstinence or no heavy use). While binary success/failure analyses are common, McCann and Li (CNS Neurosci Ther 2012; 18: 414-418) introduced "number of beyond-threshold weeks of success" (NOBWOS) scores to avoid dichotomized outcomes. NOBWOS scoring employs an efficacy "hurdle" with values reflecting duration of success. Here, we evaluate NOBWOS scores rigorously. Formal analysis of mathematical structure of NOBWOS scores is followed by simulation studies spanning diverse conditions to assess operating characteristics of five linear-rank tests on NOBWOS scores. Simulations include assessment of Fisher's exact test applied to hurdle component. On average, statistical power was approximately equal for five linear-rank tests. Under none of conditions examined did Fisher's exact test exhibit greater statistical power than any of the linear-rank tests. These linear-rank tests provide good Type I and Type II error control for comparing distributions of NOBWOS scores between groups (e.g. active vs. placebo). All methods were applied to re-analyses of data from four clinical trials of differing lengths and substances of abuse. These linear-rank tests agreed across all trials in rejecting (or not) their null (equality of distributions) at ≤ 0.05. © The Author(s) 2016.

  4. A Comparison between Linear IRT Observed-Score Equating and Levine Observed-Score Equating under the Generalized Kernel Equating Framework

    Science.gov (United States)

    Chen, Haiwen

    2012-01-01

    In this article, linear item response theory (IRT) observed-score equating is compared under a generalized kernel equating framework with Levine observed-score equating for nonequivalent groups with anchor test design. Interestingly, these two equating methods are closely related despite being based on different methodologies. Specifically, when…

  5. Estimating Sample Size for Usability Testing

    Directory of Open Access Journals (Sweden)

    Alex Cazañas

    2017-02-01

    Full Text Available One strategy used to assure that an interface meets user requirements is to conduct usability testing. When conducting such testing one of the unknowns is sample size. Since extensive testing is costly, minimizing the number of participants can contribute greatly to successful resource management of a project. Even though a significant number of models have been proposed to estimate sample size in usability testing, there is still not consensus on the optimal size. Several studies claim that 3 to 5 users suffice to uncover 80% of problems in a software interface. However, many other studies challenge this assertion. This study analyzed data collected from the user testing of a web application to verify the rule of thumb, commonly known as the “magic number 5”. The outcomes of the analysis showed that the 5-user rule significantly underestimates the required sample size to achieve reasonable levels of problem detection.

  6. An Analysis of Cross Racial Identity Scale Scores Using Classical Test Theory and Rasch Item Response Models

    Science.gov (United States)

    Sussman, Joshua; Beaujean, A. Alexander; Worrell, Frank C.; Watson, Stevie

    2013-01-01

    Item response models (IRMs) were used to analyze Cross Racial Identity Scale (CRIS) scores. Rasch analysis scores were compared with classical test theory (CTT) scores. The partial credit model demonstrated a high goodness of fit and correlations between Rasch and CTT scores ranged from 0.91 to 0.99. CRIS scores are supported by both methods.…

  7. Sample Selectivity and the Validity of International Student Achievement Tests in Economic Research. NBER Working Paper No. 15867

    Science.gov (United States)

    Hanushek, Eric A.; Woessmann, Ludger

    2010-01-01

    Critics of international student comparisons argue that results may be influenced by differences in the extent to which countries adequately sample their entire student populations. In this research note, we show that larger exclusion and non-response rates are related to better country average scores on international tests, as are larger…

  8. The Fagerström Test for Nicotine Dependence in a Dutch sample of daily smokers and ex-smokers

    NARCIS (Netherlands)

    Vink, Jacqueline M.; Willemsen, Gonneke; Beem, A. Leo; Boomsma, Dorret I.

    2005-01-01

    We explored the performance of the Fagerström Test for Nicotine Dependence (FTND) in a sample of 1378 daily smokers and 1058 ex-smokers who participated in a survey study of the Netherlands Twin Register. FTND scores were higher for smokers than for ex-smokers. Nicotine dependence level was not

  9. Association testing for next-generation sequencing data using score statistics

    DEFF Research Database (Denmark)

    Skotte, Line; Korneliussen, Thorfinn Sand; Albrechtsen, Anders

    2012-01-01

    computationally feasible due to the use of score statistics. As part of the joint likelihood, we model the distribution of the phenotypes using a generalized linear model framework, which works for both quantitative and discrete phenotypes. Thus, the method presented here is applicable to case-control studies...... of genotype calls into account have been proposed; most require numerical optimization which for large-scale data is not always computationally feasible. We show that using a score statistic for the joint likelihood of observed phenotypes and observed sequencing data provides an attractive approach...... to association testing for next-generation sequencing data. The joint model accounts for the genotype classification uncertainty via the posterior probabilities of the genotypes given the observed sequencing data, which gives the approach higher power than methods based on called genotypes. This strategy remains...

  10. Stability of Scores on Super's Work Values Inventory-Revised

    Science.gov (United States)

    Leuty, Melanie E.

    2013-01-01

    Test-retest data on Super's Work Values Inventory-Revised for a group of predominantly White ("N" = 995) women (mean age = 23.5 years, SD = 8.07) and men (mean age = 21.5 years, SD = 5.80) showed stability in mean-level scores over a period of 1 year for the sample as a whole. However, low raw score and rank order stability coefficients…

  11. A Prorating Method for Estimating MMPI-2-RF Scores From MMPI Responses: Examination of Score Fidelity and Illustration of Empirical Utility in the PERSEREC Police Integrity Study Sample.

    Science.gov (United States)

    Tarescavage, Anthony M; Corey, David M; Ben-Porath, Yossef S

    2016-04-01

    The purpose of the current study was to identify Minnesota Multiphasic Personality Inventory-2-Restructured Form (MMPI-2-RF) correlates of police officer integrity violations and other problem behaviors in an archival database with original MMPI item responses and collateral information regarding integrity violations obtained for 417 male officers. In Study 1, we estimated MMPI-2-RF scores from the MMPI item pool (which includes approximately 80% of the MMPI-2-RF items) in a normative sample, a psychiatric inpatient sample, and a police officer sample, and conducted analyses that demonstrated the comparability of estimated and full scale scores for 41 of the 51 MMPI-2-RF scales. In Study 2, we correlated estimated MMPI-2-RF scores with information about subsequent integrity violations and problem behaviors from the integrity violation data set. Several meaningful associations were obtained, predominately with scales from the emotional, thought, and behavioral dysfunction domains of the MMPI-2-RF. Application of a correction for range restriction yielded substantially improved validity estimates. Finally, we calculated relative risk ratios for the statistically significant findings using cutoffs lower than 65T, which is traditionally used to identify clinically significant elevations, and found several meaningful relative risk ratios. © The Author(s) 2015.

  12. Effects of Analytical and Holistic Scoring Patterns on Scorer Reliability in Biology Essay Tests

    Science.gov (United States)

    Ebuoh, Casmir N.

    2018-01-01

    Literature revealed that the patterns/methods of scoring essay tests had been criticized for not being reliable and this unreliability is more likely to be more in internal examinations than in the external examinations. The purpose of this study is to find out the effects of analytical and holistic scoring patterns on scorer reliability in…

  13. Measuring Biological Age via Metabonomics: The Metabolic Age Score.

    Science.gov (United States)

    Hertel, Johannes; Friedrich, Nele; Wittfeld, Katharina; Pietzner, Maik; Budde, Kathrin; Van der Auwera, Sandra; Lohmann, Tobias; Teumer, Alexander; Völzke, Henry; Nauck, Matthias; Grabe, Hans Jörgen

    2016-02-05

    Chronological age is one of the most important risk factors for adverse clinical outcome. Still, two individuals at the same chronological age could have different biological aging states, leading to different individual risk profiles. Capturing this individual variance could constitute an even more powerful predictor enhancing prediction in age-related morbidity. Applying a nonlinear regression technique, we constructed a metabonomic measurement for biological age, the metabolic age score, based on urine data measured via (1)H NMR spectroscopy. We validated the score in two large independent population-based samples by revealing its significant associations with chronological age and age-related clinical phenotypes as well as its independent predictive value for survival over approximately 13 years of follow-up. Furthermore, the metabolic age score was prognostic for weight loss in a sample of individuals who underwent bariatric surgery. We conclude that the metabolic age score is an informative measurement of biological age with possible applications in personalized medicine.

  14. The Dysexecutive Questionnaire advanced: item and test score characteristics, 4-factor solution, and severity classification.

    Science.gov (United States)

    Bodenburg, Sebastian; Dopslaff, Nina

    2008-01-01

    The Dysexecutive Questionnaire (DEX, , Behavioral assessment of the dysexecutive syndrome, 1996) is a standardized instrument to measure possible behavioral changes as a result of the dysexecutive syndrome. Although initially intended only as a qualitative instrument, the DEX has also been used increasingly to address quantitative problems. Until now there have not been more fundamental statistical analyses of the questionnaire's testing quality. The present study is based on an unselected sample of 191 patients with acquired brain injury and reports on the data relating to the quality of the items, the reliability and the factorial structure of the DEX. Item 3 displayed too great an item difficulty, whereas item 11 was not sufficiently discriminating. The DEX's reliability in self-rating is r = 0.85. In addition to presenting the statistical values of the tests, a clinical severity classification of the overall scores of the 4 found factors and of the questionnaire as a whole is carried out on the basis of quartile standards.

  15. Forging the Basis for Developing Protein-Ligand Interaction Scoring Functions.

    Science.gov (United States)

    Liu, Zhihai; Su, Minyi; Han, Li; Liu, Jie; Yang, Qifan; Li, Yan; Wang, Renxiao

    2017-02-21

    In structure-based drug design, scoring functions are widely used for fast evaluation of protein-ligand interactions. They are often applied in combination with molecular docking and de novo design methods. Since the early 1990s, a whole spectrum of protein-ligand interaction scoring functions have been developed. Regardless of their technical difference, scoring functions all need data sets combining protein-ligand complex structures and binding affinity data for parametrization and validation. However, data sets of this kind used to be rather limited in terms of size and quality. On the other hand, standard metrics for evaluating scoring function used to be ambiguous. Scoring functions are often tested in molecular docking or even virtual screening trials, which do not directly reflect the genuine quality of scoring functions. Collectively, these underlying obstacles have impeded the invention of more advanced scoring functions. In this Account, we describe our long-lasting efforts to overcome these obstacles, which involve two related projects. On the first project, we have created the PDBbind database. It is the first database that systematically annotates the protein-ligand complexes in the Protein Data Bank (PDB) with experimental binding data. This database has been updated annually since its first public release in 2004. The latest release (version 2016) provides binding data for 16 179 biomolecular complexes in PDB. Data sets provided by PDBbind have been applied to many computational and statistical studies on protein-ligand interaction and various subjects. In particular, it has become a major data resource for scoring function development. On the second project, we have established the Comparative Assessment of Scoring Functions (CASF) benchmark for scoring function evaluation. Our key idea is to decouple the "scoring" process from the "sampling" process, so scoring functions can be tested in a relatively pure context to reflect their quality. In our

  16. Test of a two-dimensional neutron spin analyzer

    International Nuclear Information System (INIS)

    Falus, Peter; Vorobiev, Alexei; Krist, Thomas

    2006-01-01

    The aim of this measurement was to test the new large-area spin polarization analyzer for the EVA-SERGIS beamline at Institute Laue Langevin (ILL). The spin analyzer, which was built in Berlin selects one of the two spin states of a neutron beam of wavelength 5.5 A impinging on a horizontal sample and reflected or scattered from the sample. The spin is analyzed for all neutrons scattered into a detector with an area of 190 mmx190 mm positioned 2.7 m behind the sample, thus covering an angular interval of 4 o x4 o . The tests were done at the HMI V14 beamline followed by tests at the EVA beamline at ILL. The transmission for the two spin components, the flipping ratio and small angle scattering were recorded while scanning the incoming beam on the analyzer. It was clearly visible, that due to the stacked construction the intensity is blocked at regular intervals. Careful inspection shows that the transmission of the good spin component is more than 0.72 for 60% of the detector area and the corrected flipping ratio is more than 47 for 60% of the detector area. Although some small-angle scattering is visible, it is notable that this analyzer design has small scattering intensities

  17. Test of a two-dimensional neutron spin analyzer

    Science.gov (United States)

    Falus, Péter; Vorobiev, Alexei; Krist, Thomas

    2006-11-01

    The aim of this measurement was to test the new large-area spin polarization analyzer for the EVA-SERGIS beamline at Institute Laue Langevin (ILL). The spin analyzer, which was built in Berlin selects one of the two spin states of a neutron beam of wavelength 5.5 Å impinging on a horizontal sample and reflected or scattered from the sample. The spin is analyzed for all neutrons scattered into a detector with an area of 190 mm×190 mm positioned 2.7 m behind the sample, thus covering an angular interval of 4°×4°. The tests were done at the HMI V14 beamline followed by tests at the EVA beamline at ILL. The transmission for the two spin components, the flipping ratio and small angle scattering were recorded while scanning the incoming beam on the analyzer. It was clearly visible, that due to the stacked construction the intensity is blocked at regular intervals. Careful inspection shows that the transmission of the good spin component is more than 0.72 for 60% of the detector area and the corrected flipping ratio is more than 47 for 60% of the detector area. Although some small-angle scattering is visible, it is notable that this analyzer design has small scattering intensities.

  18. Linkage analysis in nuclear families. 2: Relationship between affected sib-pair tests and lod score analysis.

    Science.gov (United States)

    Knapp, M; Seuchter, S A; Baur, M P

    1994-01-01

    It is believed that the main advantage of affected sib-pair tests is that their application requires no information about the underlying genetic mechanism of the disease. However, here it is proved that the mean test, which can be considered the most prominent of the affected sib-pair tests, is equivalent to lod score analysis for an assumed recessive mode of inheritance, irrespective of the true mode of the disease. Further relationships of certain sib-pair tests and lod score analysis under specific assumed genetic modes are investigated.

  19. Evaluation of alternative macroinvertebrate sampling techniques for use in a new tropical freshwater bioassessment scheme

    Directory of Open Access Journals (Sweden)

    Isabel Eleanor Moore

    2015-06-01

    Full Text Available Aim: The study aimed to determine the effectiveness of benthic macroinvertebrate dredge net sampling procedures as an alternative method to kick net sampling in tropical freshwater systems, specifically as an evaluation of sampling methods used in the Zambian Invertebrate Scoring System (ZISS river bioassessment scheme. Tropical freshwater ecosystems are sometimes dangerous or inaccessible to sampling teams using traditional kick-sampling methods, so identifying an alternative procedure that produces similar results is necessary in order to collect data from a wide variety of habitats.MethodsBoth kick and dredge nets were used to collect macroinvertebrate samples at 16 riverine sites in Zambia, ranging from backwaters and floodplain lagoons to fast flowing streams and rivers. The data were used to calculate ZISS, diversity (S: number of taxa present, and Average Score Per Taxon (ASPT scores per site, using the two sampling methods to compare their sampling effectiveness. Environmental parameters, namely pH, conductivity, underwater photosynthetically active radiation (PAR, temperature, alkalinity, flow, and altitude, were also recorded and used in statistical analysis. Invertebrate communities present at the sample sites were determined using multivariate procedures.ResultsAnalysis of the invertebrate community and environmental data suggested that the testing exercise was undertaken in four distinct macroinvertebrate community types, supporting at least two quite different macroinvertebrate assemblages, and showing significant differences in habitat conditions. Significant correlations were found for all three bioassessment score variables between results acquired using the two methods, with dredge-sampling normally producing lower scores than did the kick net procedures. Linear regression models were produced in order to correct each biological variable score collected by a dredge net to a score similar to that of one collected by kick net

  20. A stage is a stage is a stage: a direct comparison of two scoring systems.

    Science.gov (United States)

    Dawson, Theo L

    2003-09-01

    L. Kohlberg (1969) argued that his moral stages captured a developmental sequence specific to the moral domain. To explore that contention, the author compared stage assignments obtained with the Standard Issue Scoring System (A. Colby & L. Kohlberg, 1987a, 1987b) and those obtained with a generalized content-independent stage-scoring system called the Hierarchical Complexity Scoring System (T. L. Dawson, 2002a), on 637 moral judgment interviews (participants' ages ranged from 5 to 86 years). The correlation between stage scores produced with the 2 systems was .88. Although standard issue scoring and hierarchical complexity scoring often awarded different scores up to Kohlberg's Moral Stage 2/3, from his Moral Stage 3 onward, scores awarded with the two systems predominantly agreed. The author explores the implications for developmental research.

  1. Derivation of Two Critical Appraisal Scores for Trainees to Evaluate Online Educational Resources: A METRIQ Study

    Directory of Open Access Journals (Sweden)

    Teresa M. Chan

    2016-09-01

    Full Text Available Introduction: Online education resources (OERs, like blogs and podcasts, increasingly augment or replace traditional medical education resources such as textbooks and lectures. Trainees’ ability to evaluate these resources is poor, and few quality assessment aids have been developed to assist them. This study aimed to derive a quality evaluation instrument for this purpose. Methods: We used a three-phase methodology. In Phase 1, a previously derived list of 151 OER quality indicators was reduced to 13 items using data from published consensus-building studies (of medical educators, expert podcasters, and expert bloggers and subsequent evaluation by our team. In Phase 2, these 13 items were converted to seven-point Likert scales used by trainee raters (n=40 to evaluate 39 OERs. The reliability and usability of these 13 rating items was determined using responses from trainee raters, and top items were used to create two OER quality evaluation instruments. In Phase 3, these instruments were compared to an external certification process (the ALiEM AIR certification and the gestalt evaluation of the same 39 blog posts by 20 faculty educators. Results: Two quality-evaluation instruments were derived with fair inter-rater reliability: the METRIQ-8 Score (Inter class correlation coefficient [ICC]=0.30, p<0.001 and the METRIQ-5 Score (ICC=0.22, p<0.001. Both scores, when calculated using the derivation data, correlated with educator gestalt (Pearson’s r=0.35, p=0.03 and r=0.41, p<0.01, respectively and were related to increased odds of receiving an ALiEM AIR certification (odds ratio=1.28, p=0.03; OR=1.5, p=0.004, respectively. Conclusion: Two novel scoring instruments with adequate psychometric properties were derived to assist trainees in evaluating OER quality and correlated favourably with gestalt ratings of online educational resources by faculty educators. Further testing is needed to ensure these instruments are accurate when applied by

  2. Derivation of Two Critical Appraisal Scores for Trainees to Evaluate Online Educational Resources: A METRIQ Study

    Science.gov (United States)

    Chan, Teresa M.; Thoma, Brent; Krishnan, Keeth; Lin, Michelle; Carpenter, Christopher R.; Astin, Matt; Kulasegaram, Kulamakan

    2016-01-01

    Introduction Online education resources (OERs), like blogs and podcasts, increasingly augment or replace traditional medical education resources such as textbooks and lectures. Trainees’ ability to evaluate these resources is poor, and few quality assessment aids have been developed to assist them. This study aimed to derive a quality evaluation instrument for this purpose. Methods We used a three-phase methodology. In Phase 1, a previously derived list of 151 OER quality indicators was reduced to 13 items using data from published consensus-building studies (of medical educators, expert podcasters, and expert bloggers) and subsequent evaluation by our team. In Phase 2, these 13 items were converted to seven-point Likert scales used by trainee raters (n=40) to evaluate 39 OERs. The reliability and usability of these 13 rating items was determined using responses from trainee raters, and top items were used to create two OER quality evaluation instruments. In Phase 3, these instruments were compared to an external certification process (the ALiEM AIR certification) and the gestalt evaluation of the same 39 blog posts by 20 faculty educators. Results Two quality-evaluation instruments were derived with fair inter-rater reliability: the METRIQ-8 Score (Inter class correlation coefficient [ICC]=0.30, p<0.001) and the METRIQ-5 Score (ICC=0.22, p<0.001). Both scores, when calculated using the derivation data, correlated with educator gestalt (Pearson’s r=0.35, p=0.03 and r=0.41, p<0.01, respectively) and were related to increased odds of receiving an ALiEM AIR certification (odds ratio=1.28, p=0.03; OR=1.5, p=0.004, respectively). Conclusion Two novel scoring instruments with adequate psychometric properties were derived to assist trainees in evaluating OER quality and correlated favourably with gestalt ratings of online educational resources by faculty educators. Further testing is needed to ensure these instruments are accurate when applied by trainees. PMID

  3. Recalibration of the ACC/AHA Risk Score in Two Population-Based German Cohorts.

    Science.gov (United States)

    de Las Heras Gala, Tonia; Geisel, Marie Henrike; Peters, Annette; Thorand, Barbara; Baumert, Jens; Lehmann, Nils; Jöckel, Karl-Heinz; Moebus, Susanne; Erbel, Raimund; Meisinger, Christine; Mahabadi, Amir Abbas; Koenig, Wolfgang

    2016-01-01

    The 2013 ACC/AHA guidelines introduced an algorithm for risk assessment of atherosclerotic cardiovascular disease (ASCVD) within 10 years. In Germany, risk assessment with the ESC SCORE is limited to cardiovascular mortality. Applicability of the novel ACC/AHA risk score to the German population has not yet been assessed. We therefore sought to recalibrate and evaluate the ACC/AHA risk score in two German cohorts and to compare it to the ESC SCORE. We studied 5,238 participants from the KORA surveys S3 (1994-1995) and S4 (1999-2001) and 4,208 subjects from the Heinz Nixdorf Recall (HNR) Study (2000-2003). There were 383 (7.3%) and 271 (6.4%) first non-fatal or fatal ASCVD events within 10 years in KORA and in HNR, respectively. Risk scores were evaluated in terms of calibration and discrimination performance. The original ACC/AHA risk score overestimated 10-year ASCVD rates by 37% in KORA and 66% in HNR. After recalibration, miscalibration diminished to 8% underestimation in KORA and 12% overestimation in HNR. Discrimination performance of the ACC/AHA risk score was not affected by the recalibration (KORA: C = 0.78, HNR: C = 0.74). The ESC SCORE overestimated by 5% in KORA and by 85% in HNR. The corresponding C-statistic was 0.82 in KORA and 0.76 in HNR. The recalibrated ACC/AHA risk score showed strongly improved calibration compared to the original ACC/AHA risk score. Predicting only cardiovascular mortality, discrimination performance of the commonly used ESC SCORE remained somewhat superior to the ACC/AHA risk score. Nevertheless, the recalibrated ACC/AHA risk score may provide a meaningful tool for estimating 10-year risk of fatal and non-fatal cardiovascular disease in Germany.

  4. Association of Health Sciences Reasoning Test scores with academic and experiential performance.

    Science.gov (United States)

    Cox, Wendy C; McLaughlin, Jacqueline E

    2014-05-15

    To assess the association of scores on the Health Sciences Reasoning Test (HSRT) with academic and experiential performance in a doctor of pharmacy (PharmD) curriculum. The HSRT was administered to 329 first-year (P1) PharmD students. Performance on the HSRT and its subscales was compared with academic performance in 29 courses throughout the curriculum and with performance in advanced pharmacy practice experiences (APPEs). Significant positive correlations were found between course grades in 8 courses and HSRT overall scores. All significant correlations were accounted for by pharmaceutical care laboratory courses, therapeutics courses, and a law and ethics course. There was a lack of moderate to strong correlation between HSRT scores and academic and experiential performance. The usefulness of the HSRT as a tool for predicting student success may be limited.

  5. Impact of Answer-Switching Behavior on Multiple-Choice Test Scores in Higher Education

    Directory of Open Access Journals (Sweden)

    Ramazan BAŞTÜRK

    2011-06-01

    Full Text Available The multiple- choice format is one of the most popular selected-response item formats used in educational testing. Researchers have shown that Multiple-choice type test is a useful vehicle for student assessment in core university subjects that usually have large student numbers. Even though the educators, test experts and different test recourses maintain the idea that the first answer should be retained, many researchers argued that this argument is not dependent with empirical findings. The main question of this study is to examine how the answer switching behavior affects the multiple-choice test score. Additionally, gender differences and relationship between number of answer switching behavior and item parameters (item difficulty and item discrimination were investigated. The participants in this study consisted of 207 upper-level College of Education students from mid-sized universities. A Midterm exam consisted of 20 multiple-choice questions was used. According to the result of this study, answer switching behavior statistically increase test scores. On the other hand, there is no significant gender difference in answer-switching behavior. Additionally, there is a significant negative relationship between answer switching behavior and item difficulties.

  6. Two-step calibration method for multi-algorithm score-based face recognition systems by minimizing discrimination loss

    NARCIS (Netherlands)

    Susyanto, N.; Veldhuis, R.N.J.; Spreeuwers, L.J.; Klaassen, C.A.J.; Fierrez, J.; Li, S.Z.; Ross, A.; Veldhuis, R.; Alonso-Fernandez, F.; Bigun, J.

    2016-01-01

    We propose a new method for combining multi-algorithm score-based face recognition systems, which we call the two-step calibration method. Typically, algorithms for face recognition systems produce dependent scores. The two-step method is based on parametric copulas to handle this dependence. Its

  7. Testing a groundwater sampling tool: Are the samples representative?

    International Nuclear Information System (INIS)

    Kaback, D.S.; Bergren, C.L.; Carlson, C.A.; Carlson, C.L.

    1989-01-01

    A ground water sampling tool, the HydroPunch trademark, was tested at the Department of Energy's Savannah River Site in South Carolina to determine if representative ground water samples could be obtained without installing monitoring wells. Chemical analyses of ground water samples collected with the HydroPunch trademark from various depths within a borehole were compared with chemical analyses of ground water from nearby monitoring wells. The site selected for the test was in the vicinity of a large coal storage pile and a coal pile runoff basin that was constructed to collect the runoff from the coal storage pile. Existing monitoring wells in the area indicate the presence of a ground water contaminant plume that: (1) contains elevated concentrations of trace metals; (2) has an extremely low pH; and (3) contains elevated concentrations of major cations and anions. Ground water samples collected with the HydroPunch trademark provide in excellent estimate of ground water quality at discrete depths. Groundwater chemical data collected from various depths using the HydroPunch trademark can be averaged to simulate what a screen zone in a monitoring well would sample. The averaged depth-discrete data compared favorably with the data obtained from the nearby monitoring wells

  8. CaPTHUS scoring model in primary hyperparathyroidism: can it eliminate the need for ioPTH testing?

    Science.gov (United States)

    Elfenbein, Dawn M; Weber, Sara; Schneider, David F; Sippel, Rebecca S; Chen, Herbert

    2015-04-01

    The CaPTHUS model was reported to have a positive predictive value of 100 % to correctly predict single-gland disease in patients with primary hyperparathyroidism, thus obviating the need for intraoperative parathyroid hormone (ioPTH) testing. We sought to apply the CaPTHUS scoring model in our patient population and assess its utility in predicting long-term biochemical cure. We retrospective reviewed all parathyroidectomies for primary hyperparathyroidism performed at our university hospital from 2003 to 2012. We routinely perform ioPTH testing. Biochemical cure was defined as a normal calcium level at 6 months. A total of 1,421 patients met the inclusion criteria: 78 % of patients had a single adenoma at the time of surgery, 98 % had a normal serum calcium at 1 week postoperatively, and 96 % had a normal serum calcium level 6 months postoperatively. Using the CaPTHUS scoring model, 307 patients (22.5 %) had a score of ≥ 3, with a positive predictive value of 91 % for single adenoma. A CaPTHUS score of ≥ 3 had a positive predictive value of 98 % for biochemical cure at 1 week as well as at 6 months. In our population, where ioPTH testing is used routinely to guide use of bilateral exploration, patients with a preoperative CaPTHUS score of ≥ 3 had good long-term biochemical cure rates. However, the model only predicted adenoma in 91 % of cases. If minimally invasive parathyroidectomy without ioPTH testing had been done for these patients, the cure rate would have dropped from 98 % to an unacceptable 89 %. Even in these patients with high CaPTHUS scores, multigland disease is present in almost 10 %, and ioPTH testing is necessary.

  9. Interpreting force concept inventory scores: Normalized gain and SAT scores

    Directory of Open Access Journals (Sweden)

    Jeffrey J. Steinert

    2007-05-01

    Full Text Available Preinstruction SAT scores and normalized gains (G on the force concept inventory (FCI were examined for individual students in interactive engagement (IE courses in introductory mechanics at one high school (N=335 and one university (N=292 , and strong, positive correlations were found for both populations ( r=0.57 and r=0.46 , respectively. These correlations are likely due to the importance of cognitive skills and abstract reasoning in learning physics. The larger correlation coefficient for the high school population may be a result of the much shorter time interval between taking the SAT and studying mechanics, because the SAT may provide a more current measure of abilities when high school students begin the study of mechanics than it does for college students, who begin mechanics years after the test is taken. In prior research a strong correlation between FCI G and scores on Lawson’s Classroom Test of Scientific Reasoning for students from the same two schools was observed. Our results suggest that, when interpreting class average normalized FCI gains and comparing different classes, it is important to take into account the variation of students’ cognitive skills, as measured either by the SAT or by Lawson’s test. While Lawson’s test is not commonly given to students in most introductory mechanics courses, SAT scores provide a readily available alternative means of taking account of students’ reasoning abilities. Knowing the students’ cognitive level before instruction also allows one to alter instruction or to use an intervention designed to improve students’ cognitive level.

  10. Interpreting force concept inventory scores: Normalized gain and SAT scores

    Directory of Open Access Journals (Sweden)

    Vincent P. Coletta

    2007-05-01

    Full Text Available Preinstruction SAT scores and normalized gains (G on the force concept inventory (FCI were examined for individual students in interactive engagement (IE courses in introductory mechanics at one high school (N=335 and one university (N=292, and strong, positive correlations were found for both populations (r=0.57 and r=0.46, respectively. These correlations are likely due to the importance of cognitive skills and abstract reasoning in learning physics. The larger correlation coefficient for the high school population may be a result of the much shorter time interval between taking the SAT and studying mechanics, because the SAT may provide a more current measure of abilities when high school students begin the study of mechanics than it does for college students, who begin mechanics years after the test is taken. In prior research a strong correlation between FCI G and scores on Lawson’s Classroom Test of Scientific Reasoning for students from the same two schools was observed. Our results suggest that, when interpreting class average normalized FCI gains and comparing different classes, it is important to take into account the variation of students’ cognitive skills, as measured either by the SAT or by Lawson’s test. While Lawson’s test is not commonly given to students in most introductory mechanics courses, SAT scores provide a readily available alternative means of taking account of students’ reasoning abilities. Knowing the students’ cognitive level before instruction also allows one to alter instruction or to use an intervention designed to improve students’ cognitive level.

  11. [The Amsterdam Dementia Screening Test in cognitively healthy and clinical samples. An update of normative data].

    Science.gov (United States)

    van Toutert, Meta; Diesfeldt, Han; Hoek, Dirk

    2016-10-01

    The six tests in the Amsterdam Dementia Screening Test (ADST) examine the cognitive domains of episodic memory (delayed picture recognition, word learning), orientation, category fluency (animals and occupations), constructional ability (figure copying) and executive function (alternating sequences). New normative data were collected in a sample of 102 elderly volunteers (aged 65-94), including subjects with medical or other health conditions, except dementia or frank cognitive impairment (MMSE > 24). Included subjects were independent in complex instrumental activities of daily living.Fluency, not the other tests, needed adjustment for age and education. A deficit score (0-1) was computed for each test. Summation (range 0-6) proved useful in differentiating patients with dementia (N = 741) from normal elderly (N = 102).Positive and negative predictive power across a range of summed deficit scores and base rates are displayed in Bayesian probability tables.In the normal elderly, delayed recall for eight words was tested and adjusted for initial recall. A recognition test mixed the target words with eight distractors. Delayed recognition was adjusted for immediate and delayed recall.The ADST and the normative data in this paper help the clinical neuropsychologist to make decisions concerning the presence or absence of neurocognitive disorder in individual elderly examinees.

  12. Score Gains on g-loaded Tests: No g

    NARCIS (Netherlands)

    te Nijenhuis, J.; van Vianen, A.E.M.; van der Flier, H.

    2007-01-01

    IQ scores provide the best general predictor of success in education, job training, and work. However, there are many ways in which IQ scores can be increased, for instance by means of retesting or participation in learning potential training programs. What is the nature of these score gains? Jensen

  13. Lord-Wingersky Algorithm Version 2.0 for Hierarchical Item Factor Models with Applications in Test Scoring, Scale Alignment, and Model Fit Testing.

    Science.gov (United States)

    Cai, Li

    2015-06-01

    Lord and Wingersky's (Appl Psychol Meas 8:453-461, 1984) recursive algorithm for creating summed score based likelihoods and posteriors has a proven track record in unidimensional item response theory (IRT) applications. Extending the recursive algorithm to handle multidimensionality is relatively simple, especially with fixed quadrature because the recursions can be defined on a grid formed by direct products of quadrature points. However, the increase in computational burden remains exponential in the number of dimensions, making the implementation of the recursive algorithm cumbersome for truly high-dimensional models. In this paper, a dimension reduction method that is specific to the Lord-Wingersky recursions is developed. This method can take advantage of the restrictions implied by hierarchical item factor models, e.g., the bifactor model, the testlet model, or the two-tier model, such that a version of the Lord-Wingersky recursive algorithm can operate on a dramatically reduced set of quadrature points. For instance, in a bifactor model, the dimension of integration is always equal to 2, regardless of the number of factors. The new algorithm not only provides an effective mechanism to produce summed score to IRT scaled score translation tables properly adjusted for residual dependence, but leads to new applications in test scoring, linking, and model fit checking as well. Simulated and empirical examples are used to illustrate the new applications.

  14. Attributes of diagnostic tests to increase uptake of dual testing for syphilis and HIV in Port-au-Prince, Haiti.

    Science.gov (United States)

    Bristow, Claire C; Lee, Sung-Jae; Severe, Linda; William Pape, Jean; Javanbakht, Marjan; Scott Comulada, Warren; Klausner, Jeffrey D

    2017-03-01

    Introduction Syphilis and HIV screening is highly recommended for pregnant women and those at risk for infection. We used conjoint analysis to identify factors associated with testing preferences for HIV and syphilis infection. Methods We recruited 298 men and women 18 years and over seeking testing or care at GHESKIO (Haitian Study Group for Kaposi's Sarcoma and Opportunistic Infections) clinics. We created eight hypothetical dual HIV-syphilis test profiles varying across six dichotomous attributes. Participants were asked to rate each profile using Likert preference scales. An impact score was generated for each attribute by taking the difference between the preference scores for the preferred and non-preferred level of each attribute. Two-sided one-sample t-test was used to generate p values. Results Of 298 study participants, 61 (20.5%) were male. Of 237 females, 49 (20.7%) were pregnant. Cost (free vs. US$4; p syphilis testing preferences for this study sample in Port-au-Prince prioritized cost, single fingerprick, laboratory-based testing and timeliness.

  15. Optimization of Sample Preparation for the Identification and Quantification of Saxitoxin in Proficiency Test Mussel Sample using Liquid Chromatography-Tandem Mass Spectrometry

    Directory of Open Access Journals (Sweden)

    Kirsi Harju

    2015-11-01

    Full Text Available Saxitoxin (STX and some selected paralytic shellfish poisoning (PSP analogues in mussel samples were identified and quantified with liquid chromatography-tandem mass spectrometry (LC-MS/MS. Sample extraction and purification methods of mussel sample were optimized for LC-MS/MS analysis. The developed method was applied to the analysis of the homogenized mussel samples in the proficiency test (PT within the EQuATox project (Establishment of Quality Assurance for the Detection of Biological Toxins of Potential Bioterrorism Risk. Ten laboratories from eight countries participated in the STX PT. Identification of PSP toxins in naturally contaminated mussel samples was performed by comparison of product ion spectra and retention times with those of reference standards. The quantitative results were obtained with LC-MS/MS by spiking reference standards in toxic mussel extracts. The results were within the z-score of ±1 when compared to the results measured with the official AOAC (Association of Official Analytical Chemists method 2005.06, pre-column oxidation high-performance liquid chromatography with fluorescence detection (HPLC-FLD.

  16. The Addenbrooke's Cognitive Examination Revised (ACE-R) and its sub-scores: normative values in an Italian population sample.

    Science.gov (United States)

    Siciliano, Mattia; Raimo, Simona; Tufano, Dario; Basile, Giuseppe; Grossi, Dario; Santangelo, Franco; Trojano, Luigi; Santangelo, Gabriella

    2016-03-01

    The Addenbrooke's Cognitive Examination Revised (ACE-R) is a rapid screening battery, including five sub-scales to explore different cognitive domains: attention/orientation, memory, fluency, language and visuospatial. ACE-R is considered useful in discriminating cognitively normal subjects from patients with mild dementia. The aim of present study was to provide normative values for ACE-R total score and sub-scale scores in a large sample of Italian healthy subjects. Five hundred twenty-six Italian healthy subjects (282 women and 246 men) of different ages (age range 20-93 years) and educational level (from primary school to university) underwent ACE-R and Montreal Cognitive Assessment (MoCA). Multiple linear regression analysis revealed that age and education significantly influenced performance on ACE-R total score and sub-scale scores. A significant effect of gender was found only in sub-scale attention/orientation. From the derived linear equation, a correction grid for raw scores was built. Inferential cut-offs score were estimated using a non-parametric technique and equivalent scores (ES) were computed. Correlation analysis showed a good significant correlation between ACE-R adjusted scores with MoCA adjusted scores (r = 0.612, p < 0.001). The present study provided normative data for the ACE-R in an Italian population useful for both clinical and research purposes.

  17. REIMEP-22 inter-laboratory comparison. ''U Age Dating - determination of the production date of a uranium certified test sample''

    Energy Technology Data Exchange (ETDEWEB)

    Venchiarutti, Celia; Richter, Stephan; Jakopic, Rozle; Aregbe, Yetunde [European Commission, Joint Research Centre (JRC), Geel (Belgium). Institute for Reference Materials and Measurements (IRMM); Varga, Zsolt; Mayer, Klaus [European Commission, Joint Research Centre (JRC), Karlsruhe (Germany). Institute for Transuranium Elements (ITU)

    2015-07-01

    The REIMEP-22 inter-laboratory comparison aimed at determining the production date of a uranium certified test sample (i.e. the last chemical separation date of the material). Participants in REIMEP-22 on ''U Age Dating - Determination of the production date of a uranium certified test sample'' received one low-enriched 20 mg uranium sample for mass spectrometry measurements and/or one 50 mg uranium sample for a-spectrometry measurements, with an undisclosed value for the production date. They were asked to report the isotope amount ratios n({sup 230}Th)/n({sup 234}U) for the 20 mg uranium sample and/or the activity ratios A({sup 230}Th)/A({sup 234}U) for the 50 mg uranium sample in addition to the calculated production date of the certified test samples with its uncertainty. Reporting of the {sup 231}Pa/{sup 235}U ratio and the respective calculated production date was optional. Eleven laboratories reported results in REIMEP-22. Two of them reported results for both the 20 mg and 50 mg uranium certified test samples. The measurement capability of the participants was assessed against the independent REIMEP-22 reference value by means of z- and zeta-scores in compliance with ISO 13528:2005. Furthermore a performance assessment criterion for acceptable uncertainty was applied to evaluate the participants' results. In general, the REIMEP-22 participants' results were satisfactory. This confirms the analytical capabilities of laboratories to determine accurately the age of uranium materials with low amount of ingrown thorium (young certified test sample). The Joint Research Centre of the European Commission (EC-JRC) organised REIMEP-22 in parallel to the preparation and certification of a uranium reference material certified for the production date (IRMM-1000a and IRMM-1000b).

  18. CHARACTERIZATION AND ACTUAL WASTE TEST WITH TANK 5F SAMPLES

    International Nuclear Information System (INIS)

    Fletcher, D.

    2007-01-01

    The initial phase of bulk waste removal operations was recently completed in Tank 5F. Video inspection of the tank indicates several mounds of sludge still remain in the tank. Additionally, a mound of white solids was observed under Riser 5. In support of chemical cleaning and heel removal programs, samples of the sludge and the mound of white solids were obtained from the tank for characterization and testing. A core sample of the sludge and Super Snapper sample of the white solids were characterized. A supernate dip sample from Tank 7F was also characterized. A portion of the sludge was used in two tank cleaning tests using oxalic acid at 50 C and 75 C. The filtered oxalic acid from the tank cleaning tests was subsequently neutralized by addition to a simulated Tank 7F supernate. Solids and liquid samples from the tank cleaning test and neutralization test were characterized. A separate report documents the results of the gas generation from the tank cleaning test using oxalic acid and Tank 5F sludge. The characterization results for the Tank 5F sludge sample (FTF-05-06-55) appear quite good with respect to the tight precision of the sample replicates, good results for the glass standards, and minimal contamination found in the blanks and glass standards. The aqua regia and sodium peroxide fusion data also show good agreement between the two dissolution methods. Iron dominates the sludge composition with other major contributors being uranium, manganese, nickel, sodium, aluminum, and silicon. The low sodium value for the sludge reflects the absence of supernate present in the sample due to the core sampler employed for obtaining the sample. The XRD and CSEM results for the Super Snapper salt sample (i.e., white solids) from Tank 5F (FTF-05-07-1) indicate the material contains hydrated sodium carbonate and bicarbonate salts along with some aluminum hydroxide. These compounds likely precipitated from the supernate in the tank. A solubility test showed the material

  19. Effect on intelligence test score of prenatal exposure to ionizing radiation in Hiroshima and Nagasaki

    International Nuclear Information System (INIS)

    Schull, W.J.; Otake, Masanori; Yoshimaru, Hiroshi.

    1988-10-01

    Analyses of intelligence test scores (Koga) at 10-11 years of age of individuals exposed prenatally to the atomic bombing of Hiroshima and Nagasaki using estimates of the uterine absorbed dose based on the recently introduced system of dosimetry, the Dosimetry System 1986 (DS86), reveal the following: 1) there is no evidence of a radiation-related effect on intelligence among those individuals exposed within 0-7 weeks after fertilization or in the 26th or subsequent weeks; 2) for individuals exposed at 8-15 weeks after fertilization, and to a lesser extent those exposed at 16-25 weeks, the mean tests scores but not the variances are significantly heterogeneous among exposure categories; 3) the cumulative distribution of test scores suggests a progressive shift downwards in individual scores with increasing exposure; and 4) within the group most sensitive to the occurrence of clinically recognizable severe mental retardation, individuals exposed 8 through 15 weeks after fertilization, the regression of intelligence score on estimated DS86 uterine absorbed dose is more linear than with T65DR fetal dose, the diminution in intelligence score under the linear model is 21-29 points at 1Gy. The effect is somewhat greater when the controls receiving less than 0.01 Gy are excluded, 24-33 points at 1 Gy. These findings are discussed in the light of the earlier analysis of the frequency of occurrence of mental retardation among the prenatally exposed survivors of the A-bombing of Hiroshima and Nagasaki. It is suggested that both are the consequences of the same underlying biological process or processes. (author)

  20. Associations of maximal strength and muscular endurance test scores with cardiorespiratory fitness and body composition.

    Science.gov (United States)

    Vaara, Jani P; Kyröläinen, Heikki; Niemi, Jaakko; Ohrankämmen, Olli; Häkkinen, Arja; Kocay, Sheila; Häkkinen, Keijo

    2012-08-01

    The purpose of the present study was to assess the relationships between maximal strength and muscular endurance test scores additionally to previously widely studied measures of body composition and maximal aerobic capacity. 846 young men (25.5 ± 5.0 yrs) participated in the study. Maximal strength was measured using isometric bench press, leg extension and grip strength. Muscular endurance tests consisted of push-ups, sit-ups and repeated squats. An indirect graded cycle ergometer test was used to estimate maximal aerobic capacity (V(O2)max). Body composition was determined with bioelectrical impedance. Moreover, waist circumference (WC) and height were measured and body mass index (BMI) calculated. Maximal bench press was positively correlated with push-ups (r = 0.61, p strength (r = 0.34, p strength correlated positively (r = 0.36-0.44, p test scores were related to maximal aerobic capacity and body fat content, while fat free mass was associated with maximal strength test scores and thus is a major determinant for maximal strength. A contributive role of maximal strength to muscular endurance tests could be identified for the upper, but not the lower extremities. These findings suggest that push-up test is not only indicative of body fat content and maximal aerobic capacity but also maximal strength of upper body, whereas repeated squat test is mainly indicative of body fat content and maximal aerobic capacity, but not maximal strength of lower extremities.

  1. Associations between MMPI-2-RF validity scale scores and extra-test measures of personality and psychopathology.

    Science.gov (United States)

    Forbey, Johnathan D; Lee, Tayla T C; Ben-Porath, Yossef S; Arbisi, Paul A; Gartland, Diane

    2013-08-01

    The current study explored associations between two potentially invalidating self-report styles detected by the Validity scales of the Minnesota Multiphasic Personality Inventory-2-Restructured Form (MMPI-2-RF), over-reporting and under-reporting, and scores on the MMPI-2-RF substantive, as well as eight collateral self-report measures administered either at the same time or within 1 to 10 days of MMPI-2-RF administration. Analyses were conducted with data provided by college students, male prisoners, and male psychiatric outpatients from a Veterans Administration facility. Results indicated that if either an over- or under-reporting response style was suggested by the MMPI-2-RF Validity scales, scores on the majority of the MMPI-2-RF substantive scales, as well as a number of collateral measures, were significantly affected in all three groups in the expected directions. Test takers who were identified as potentially engaging in an over- or under-reporting response style by the MMPI-2-RF Validity scales appeared to approach extra-test measures similarly regardless of when these measures were administered in relation to the MMPI-2-RF. Limitations and suggestions for future study are discussed.

  2. Gaze Stabilization Test Asymmetry Score as an Indicator of Previous Concussion in a Cohort of Collegiate Football Players.

    Science.gov (United States)

    Honaker, Julie A; Criter, Robin E; Patterson, Jessie N; Jones, Sherri M

    2015-07-01

    Vestibular dysfunction may lead to decreased visual acuity with head movements, which may impede athletic performance and result in injury. The purpose of this study was to test the hypothesis that athletes with history of concussion would have differences in gaze stabilization test (GST) as compared with those without a history of concussion. Cross-sectional, descriptive. University Athletic Medicine Facility. Fifteen collegiate football players with a history of concussion, 25 collegiate football players without a history of concussion. Participants completed the dizziness handicap inventory (DHI), static visual acuity, perception time test, active yaw plane GST, stability evaluation test (SET), and a bedside oculomotor examination. Independent samples t test was used to compare GST, SET, and DHI scores per group, with Bonferroni-adjusted alpha at P history of concussion. The results support further research on the use of GST for sport-related concussion evaluation and monitoring. Inclusion of objective vestibular tests in the concussion protocol may reveal the presence of peripheral vestibular or visual-vestibular deficits. Therefore, the GST may add an important perspective on the effects of concussion.

  3. Tank 241-AZ-101 Mixer Pump Test Vapor Sampling and Analysis Plan

    International Nuclear Information System (INIS)

    TEMPLETON, A.M.

    2000-01-01

    This sampling and analysis plan (SAP) identifies characterization objectives pertaining to sample collection, laboratory analytical evaluation, and reporting requirements for vapor samples obtained during the operation of mixer pumps in tank 241-AZ-101. The primary purpose of the mixer pump test (MPT) is to demonstrate that the two 300 horsepower mixer pumps installed in tank 241-AZ-101 can mobilize the settled sludge so that it can be retrieved for treatment and vitrification. Sampling will be performed in accordance with Tank 241-AZ-101 Mixer Pump Test Data Quality Objective (Banning 1999) and Data Quality Objectives for Regulatory Requirements for Hazardous and Radioactive Air Emissions Sampling and Analysis (Mulkey 1999). The sampling will verify if current air emission estimates used in the permit application are correct and provide information for future air permit applications

  4. Applying cognitive acuity theory to the development and scoring of situational judgment tests.

    Science.gov (United States)

    Leeds, J Peter

    2017-11-09

    The theory of cognitive acuity (TCA) treats the response options within items as signals to be detected and uses psychophysical methods to estimate the respondents' sensitivity to these signals. Such a framework offers new methods to construct and score situational judgment tests (SJT). Leeds (2012) defined cognitive acuity as the capacity to discern correctness and distinguish between correctness differences among simultaneously presented situation-specific response options. In this study, SJT response options were paired in order to offer the respondent a two-option choice. The contrast in correctness valence between the two options determined the magnitude of signal emission, with larger signals portending a higher probability of detection. A logarithmic relation was found between correctness valence contrast (signal stimulus) and its detectability (sensation response). Respondent sensitivity to such signals was measured and found to be related to the criterion variables. The linkage between psychophysics and elemental psychometrics may offer new directions for measurement theory.

  5. Translation and Adaptation of Knee Injury and Osteoarthritis Outcome Score (KOOS in to Persian and Testing Persian Version Reliability Among Iranians with Osteoarthritis

    Directory of Open Access Journals (Sweden)

    Solaleh Saraei-Pour

    2007-04-01

    Full Text Available Objective: To achieve a reliable tool for measuring health related quality of life among Iranians with knee osteoarthritis, by translating and culturally adapting the Knee injury and Osteoarthritis Outcome Score(KOOS to Persian and testing the reliability and internal consistency of the Iranian version. Materials & Methods: It was a non experimental methodology study. KOOS was translated and adapted culturally to Persian language and culture in three phases with respect to IQOLA project. For examining test-retest reliability Iranians version of KOOS was corresponded twice with in at least two days or at most one week interval, by 30 Iranian people with knee OA whom were referred to Municipality and 110 physiotherapy clinics of Tehran with PT order by physicians. It was a non experimental methodological research and we used sample of convenience and non probability design for sampling. Psychometric evaluation: the collected data from the questionnaires was rated and analyzed with SPSS software from the aspects of test-retest reliability, absolute reliability, subscale and item internal consistency. Results: Internal consistency which was calculated by Cronbach '&alpha was high for all the subscales (at least 0.76, except for "symptom" subscale which was moderate, and showed that items of each subscale measured the same construct. Item internal consistency after correction for overlap, was higher than optimal value (0.4, except for the items of" symptom" subscale , which demonstrated good item internal consistency. SEM and ICC which were used for evaluating the absolute and test-retest reliability in respect showed that all the subscales had good test-retest reliability (0.7 and the absolute reliability was also very good in such away that the highest calculated SEM for Persian version was 7.44 which was less than Minimal Perceptible Clinical Improvement (MPCI that is estimated 8 to 10 for the KOOS questionnaire. Conclusion: With the Persian

  6. Credit concession through credit scoring: Analysis and application proposal

    Directory of Open Access Journals (Sweden)

    Oriol Amat

    2017-01-01

    Full Text Available Purpose: The study herein develops and tests a credit scoring model which can help financial institutions in assessing credit requests.  Design/methodology/approach: The empirical study has the objective of answering two questions: (1 Which ratios better discriminate the companies based on their being solvent or insolvent? and (2 What is the relative importance of these ratios? To do this, several statistical techniques with a multifactorial focus have been used (Multivariate Analysis of Variance, Linear Discriminant Analysis, Logit and Probit Models. Several samples of companies have been used in order to obtain and to test the model.  Findings: Through the application of several statistical techniques, the credit scoring model has been proved to be effective in discriminating between good and bad creditors.  Research limitations:  This study focuses on manufacturing, commercial and services companies of all sizes in Spain; Therefore, the conclusions may differ for other geographical locations. Practical implications:  Because credit is one of the main drivers of growth, a solid credit scoring model can help financial institutions assessing to whom to grant credit and to whom not to grant credit. Social implications: Because of the growing importance of credit for our society and the fear of granting it due to the latest financial turmoil, a solid credit scoring model can strengthen the trust toward the financial institutions assessment’s.  Originality/value: There is already a stream of literature related to credit scoring. However, this paper focuses on Spanish firms and proves the results of our model based on real data. The application of the model to detect the probability of default in loans is original.

  7. Pre-test probability risk scores and their use in contemporary management of patients with chest pain: One year stress echo cohort study

    Science.gov (United States)

    Demarco, Daniela Cassar; Papachristidis, Alexandros; Roper, Damian; Tsironis, Ioannis; Byrne, Jonathan; Monaghan, Mark

    2015-01-01

    Objectives To compare how patients with chest pain would be investigated, based on the two guidelines available for UK cardiologists, on the management of patients with stable chest pain. The UK National Institute of Clinical Excellence (NICE) guideline which was published in 2010 and the European society of cardiology (ESC) guideline published in 2013. Both guidelines utilise pre-test probability risk scores, to guide the choice of investigation. Design We undertook a large retrospective study to investigate the outcomes of stress echocardiography. Setting A large tertiary centre in the UK in a contemporary clinical practice. Participants Two thirds of the patients in the cohort were referred from our rapid access chest pain clinics. Results We found that the NICE risk score overestimates risk by 20% compared to the ESC Risk score. We also found that based on the NICE guidelines, 44% of the patients presenting with chest pain, in this cohort, would have been investigated invasively, with diagnostic coronary angiography. Using the ESC guidelines, only 0.3% of the patients would be investigated invasively. Conclusion The large discrepancy between the two guidelines can be easily reduced if NICE adopted the ESC risk score. PMID:26673458

  8. Student Test Scores: How the Sausage Is Made and Why You Should Care. Evidence Speaks Reports, Vol 1, #25

    Science.gov (United States)

    Jacob, Brian A.

    2016-01-01

    Contrary to popular belief, modern cognitive assessments--including the new Common Core tests--produce test scores based on sophisticated statistical models rather than the simple percent of items a student answers correctly. While there are good reasons for this, it means that reported test scores depend on many decisions made by test designers,…

  9. Preoptometry and optometry school grade point average and optometry admissions test scores as predictors of performance on the national board of examiners in optometry part I (basic science) examination.

    Science.gov (United States)

    Bailey, J E; Yackle, K A; Yuen, M T; Voorhees, L I

    2000-04-01

    To evaluate preoptometry and optometry school grade point averages and Optometry Admission Test (OAT) scores as predictors of performance on the National Board of Examiners in Optometry NBEO Part I (Basic Science) (NBEOPI) examination. Simple and multiple correlation coefficients were computed from data obtained from a sample of three consecutive classes of optometry students (1995-1997; n = 278) at Southern California College of Optometry. The GPA after year two of optometry school was the highest correlation (r = 0.75) among all predictor variables; the average of all scores on the OAT was the highest correlation among preoptometry predictor variables (r = 0.46). Stepwise regression analysis indicated a combination of the optometry GPA, the OAT Academic Average, and the GPA in certain optometry curricular tracks resulted in an improved correlation (multiple r = 0.81). Predicted NBEOPI scores were computed from the regression equation and then analyzed by receiver operating characteristic (roc) and statistic of agreement (kappa) methods. From this analysis, we identified the predicted score that maximized identification of true and false NBEOPI failures (71% and 10%, respectively). Cross validation of this result on a separate class of optometry students resulted in a slightly lower correlation between actual and predicted NBEOPI scores (r = 0.77) but showed the criterion-predicted score to be somewhat lax. The optometry school GPA after 2 years is a reasonably good predictor of performance on the full NBEOPI examination, but the prediction is enhanced by adding the Academic Average OAT score. However, predicting performance in certain subject areas of the NBEOPI examination, for example Psychology and Ocular/Visual Biology, was rather insubstantial. Nevertheless, predicting NBEOPI performance from the best combination of year two optometry GPAs and preoptometry variables is better than has been shown in previous studies predicting optometry GPA from the best

  10. THE EFFICIENCY OF TENNIS DOUBLES SCORING SYSTEMS

    Directory of Open Access Journals (Sweden)

    Geoff Pollard

    2010-09-01

    Full Text Available In this paper a family of scoring systems for tennis doubles for testing the hypothesis that pair A is better than pair B versus the alternative hypothesis that pair B is better than A, is established. This family or benchmark of scoring systems can be used as a benchmark against which the efficiency of any doubles scoring system can be assessed. Thus, the formula for the efficiency of any doubles scoring system is derived. As in tennis singles, one scoring system based on the play-the-loser structure is shown to be more efficient than the benchmark systems. An expression for the relative efficiency of two doubles scoring systems is derived. Thus, the relative efficiency of the various scoring systems presently used in doubles can be assessed. The methods of this paper can be extended to a match between two teams of 2, 4, 8, …doubles pairs, so that it is possible to establish a measure for the relative efficiency of the various systems used for tennis contests between teams of players.

  11. Derivation and Cross-Validation of Cutoff Scores for Patients With Schizophrenia Spectrum Disorders on WAIS-IV Digit Span-Based Performance Validity Measures.

    Science.gov (United States)

    Glassmire, David M; Toofanian Ross, Parnian; Kinney, Dominique I; Nitch, Stephen R

    2016-06-01

    Two studies were conducted to identify and cross-validate cutoff scores on the Wechsler Adult Intelligence Scale-Fourth Edition Digit Span-based embedded performance validity (PV) measures for individuals with schizophrenia spectrum disorders. In Study 1, normative scores were identified on Digit Span-embedded PV measures among a sample of patients (n = 84) with schizophrenia spectrum diagnoses who had no known incentive to perform poorly and who put forth valid effort on external PV tests. Previously identified cutoff scores resulted in unacceptable false positive rates and lower cutoff scores were adopted to maintain specificity levels ≥90%. In Study 2, the revised cutoff scores were cross-validated within a sample of schizophrenia spectrum patients (n = 96) committed as incompetent to stand trial. Performance on Digit Span PV measures was significantly related to Full Scale IQ in both studies, indicating the need to consider the intellectual functioning of examinees with psychotic spectrum disorders when interpreting scores on Digit Span PV measures. © The Author(s) 2015.

  12. 7 CFR 28.952 - Testing of samples.

    Science.gov (United States)

    2010-01-01

    ... 7 Agriculture 2 2010-01-01 2010-01-01 false Testing of samples. 28.952 Section 28.952 Agriculture Regulations of the Department of Agriculture AGRICULTURAL MARKETING SERVICE (Standards, Inspections, Marketing... processing tests of the properties of cotton samples and report the results thereof to the persons from whom...

  13. Critique of the Watson-Glaser Critical Thinking Appraisal Test: The More You Know, the Lower Your Score

    Directory of Open Access Journals (Sweden)

    Kevin Possin

    2014-12-01

    Full Text Available The Watson-Glaser Critical Thinking Appraisal Test is one of the oldest, most frequently used, multiple-choice critical-thinking tests on the market in business, government, and legal settings for purposes of hiring and promotion. I demonstrate, however, that the test has serious construct-validity issues, stemming primarily from its ambiguous, unclear, misleading, and sometimes mysterious instructions, which have remained unaltered for decades. Erroneously scored items further diminish the test’s validity. As a result, having enhanced knowledge of formal and informal logic could well result in test subjects receiving lower scores on the test. That’s not how things should work for a CT assessment test.

  14. Two acute kidney injury risk scores for critically ill cancer patients undergoing non-cardiac surgery.

    Science.gov (United States)

    Xing, Xue-Zhong; Wang, Hai-Jun; Huang, Chu-Lin; Yang, Quan-Hui; Qu, Shi-Ning; Zhang, Hao; Wang, Hao; Gao, Yong; Xiao, Qing-Ling; Sun, Ke-Lin

    2012-01-01

    Several risk scoures have been used in predicting acute kidney injury (AKI) of patients undergoing general or specific operations such as cardiac surgery. This study aimed to evaluate the use of two AKI risk scores in patients who underwent non-cardiac surgery but required intensive care. The clinical data of patients who had been admitted to ICU during the first 24 hours of ICU stay between September 2009 and August 2010 at the Cancer Institute, Chinese Academy of Medical Sciences & Peking Union Medical College were retrospectively collected and analyzed. AKI was diagnosed based on the acute kidney injury network (AKIN) criteria. Two AKI risk scores were calculated: Kheterpal and Abelha factors. The incidence of AKI was 10.3%. Patients who developed AKI had a increased ICU mortality of 10.9% vs. 1.0% and an in-hospital mortality of 13.0 vs. 1.5%, compared with those without AKI. There was a significant difference between the classification of Kheterpal's AKI risk scores and the occurrence of AKI (PAbelha's AKI risk scores and the occurrence of AKI (P=0.499). Receiver operating characteristic curves demonstrated an area under the curve of 0.655±0.043 (P=0.001, 95% confidence interval: 0.571-0.739) for Kheterpal's AKI risk score and 0.507±0.044 (P=0.879, 95% confidence interval: 0.422-0.592) for Abelha's AKI risk score. Kheterpal's AKI risk scores are more accurate than Abelha's AKI risk scores in predicting the occurrence of AKI in patients undergoing non-cardiac surgery with moderate predictive capability.

  15. ISOLOK VALVE ACCEPTANCE TESTING FOR DWPF SME SAMPLING PROCESS

    Energy Technology Data Exchange (ETDEWEB)

    Edwards, T.; Hera, K.; Coleman, C.; Jones, M.; Wiedenman, B.

    2011-12-05

    the two locations were compared to determine if the contents of the tank were well mixed. The Coliwasa sampler is a tube with a stopper at the bottom and is designed to obtain grab samples from specific locations within the drum contents. A position paper (4) was issued to address the prototypic flow loop issues and simulant selections. A statistically designed plan (5) was issued to address the total number of samples each sampler needed to pull, to provide the random order in which samples were pulled and to group samples for elemental analysis. The TTR required that the Isolok sampler perform as well as the Hydragard sampler during these tests to ensure the acceptability of the Isolok sampler for use in the DWPF sampling cells. Procedure No.L9.4-5015 was used to document the sample parameters and process steps. Completed procedures are located in R&D Engineering job folder 23269.

  16. College Math Assessment: SAT Scores vs. College Math Placement Scores

    Science.gov (United States)

    Foley-Peres, Kathleen; Poirier, Dawn

    2008-01-01

    Many colleges and university's use SAT math scores or math placement tests to place students in the appropriate math course. This study compares the use of math placement scores and SAT scores for 188 freshman students. The student's grades and faculty observations were analyzed to determine if the SAT scores and/or college math assessment scores…

  17. A Comparison of Scores on the WISC-R and Lorge-Thorndike Intelligence Test for Disadvantaged Black Elementary School Children

    Science.gov (United States)

    Lowe, James D.; Karnes, Frances A.

    1976-01-01

    It is indicated that, although the scores [obtained on both tests] are significantly correlated, the tests yield significantly different scores with the Lorge-Thorndike consistently overestimating the WISC-R full scale I.Q. (Author)

  18. The Mediating Effect of Listening Metacognitive Awareness between Test-Taking Motivation and Listening Test Score: An Expectancy-Value Theory Approach

    OpenAIRE

    Xu, Jian

    2017-01-01

    The present study investigated test-taking motivation in L2 listening testing context by applying Expectancy-Value Theory as the framework. Specifically, this study was intended to examine the complex relationships among expectancy, importance, interest, listening anxiety, listening metacognitive awareness, and listening test score using data from a large-scale and high-stakes language test among Chinese first-year undergraduates. Structural equation modeling was used to examine the mediating...

  19. International Test Score Comparisons and Educational Policy: A Review of the Critiques

    Science.gov (United States)

    Carnoy, Martin

    2015-01-01

    Stanford education professor Martin Carnoy examines four main critiques of how international test results are used in policymaking. Of particular interest are critiques of the policy analyses published by the Program for International Student Assessment (PISA). Using average PISA scores as a comparative measure of student achievement is misleading…

  20. Estimating the Reliability of Aggregated and Within-Person Centered Scores in Ecological Momentary Assessment

    Science.gov (United States)

    Huang, Po-Hsien; Weng, Li-Jen

    2012-01-01

    A procedure for estimating the reliability of test scores in the context of ecological momentary assessment (EMA) was proposed to take into account the characteristics of EMA measures. Two commonly used test scores in EMA were considered: the aggregated score (AGGS) and the within-person centered score (WPCS). Conceptually, AGGS and WPCS represent…

  1. The Impact of the Use of Hierarchical Teaching on Test Scores of Students’ Technology

    Directory of Open Access Journals (Sweden)

    Zhao Guorong

    2015-01-01

    Full Text Available Test scores of students’ technology is the main basis for physical examination of college students’ physical, fitness evaluation based on test results. To change the view by the stratified teaching method consistent system of teaching mode, special movement technical level of students is improved significantly.

  2. Evaluating the Validity and Applicability of Automated Essay Scoring in Two Massive Open Online Courses

    Directory of Open Access Journals (Sweden)

    Erin Dawna Reilly

    2014-11-01

    Full Text Available The use of massive open online courses (MOOCs to expand students’ access to higher education has raised questions regarding the extent to which this course model can provide and assess authentic, higher level student learning. In response to this need, MOOC platforms have begun utilizing automated essay scoring (AES systems that allow students to engage in critical writing and free-response activities. However, there is a lack of research investigating the validity of such systems in MOOCs. This research examined the effectiveness of an AES tool to score writing assignments in two MOOCs. Results indicated that some significant differences existed between Instructor grading, AES-Holistic scores, and AES-Rubric Total scores within two MOOC courses. However, use of the AES system may still be useful given instructors’ assessment needs and intent. Findings from this research have implications for instructional technology administrators, educational designers, and instructors implementing AES learning activities in MOOC courses.

  3. The Effects of Listening to Music Just Before Reading Test on Students’ Test Score

    OpenAIRE

    MAHDAVI, Mojtaba

    2015-01-01

    Abstract. In this study the researcher  examined  the  effect  of  music  on  reading  comprehension played just before the test .  Because the emotional consequences of music listening are evident in stress and anxiety removal, it was used as a tool to pacify the mind of the tastes and boost their memory and the related cognitive processes. Experimental group did well with the mean score of) and control group (). This study confirmed that using multimedia devices such as music can not only i...

  4. Identifying genetic marker sets associated with phenotypes via an efficient adaptive score test

    KAUST Repository

    Cai, T.; Lin, X.; Carroll, R. J.

    2012-01-01

    the overall effect of a marker-set have been actively studied in recent years. For example, score tests derived under an Empirical Bayes (EB) framework (Liu and others, 2007. Semiparametric regression of multidimensional genetic pathway data: least

  5. Gender Gaps in High School GPA and ACT Scores: High School Grade Point Average and ACT Test Score by Subject and Gender. Information Brief 2014-12

    Science.gov (United States)

    ACT, Inc., 2014

    2014-01-01

    Female students who graduated from high school in 2013 averaged higher grades than their male counterparts in all subjects, but male graduates earned higher scores on the math and science sections of the ACT. This information brief looks at high school grade point average and ACT test score by subject and gender

  6. High Baseline Postconcussion Symptom Scores and Concussion Outcomes in Athletes.

    Science.gov (United States)

    Custer, Aimee; Sufrinko, Alicia; Elbin, R J; Covassin, Tracey; Collins, Micky; Kontos, Anthony

    2016-02-01

    Some healthy athletes report high levels of baseline concussion symptoms, which may be attributable to several factors (eg, illness, personality, somaticizing). However, the role of baseline symptoms in outcomes after sport-related concussion (SRC) has not been empirically examined. To determine if athletes with high symptom scores at baseline performed worse than athletes without baseline symptoms on neurocognitive testing after SRC. Cohort study. High school and collegiate athletic programs. A total of 670 high school and collegiate athletes participated in the study. Participants were divided into groups with either no baseline symptoms (Postconcussion Symptom Scale [PCSS] score = 0, n = 247) or a high level of baseline symptoms (PCSS score > 18 [top 10% of sample], n = 68). Participants were evaluated at baseline and 2 to 7 days after SRC with the Immediate Post-concussion Assessment and Cognitive Test and PCSS. Outcome measures were Immediate Post-concussion Assessment and Cognitive Test composite scores (verbal memory, visual memory, visual motor processing speed, and reaction time) and total symptom score on the PCSS. The groups were compared using repeated-measures analyses of variance with Bonferroni correction to assess interactions between group and time for symptoms and neurocognitive impairment. The no-symptoms group represented 38% of the original sample, whereas the high-symptoms group represented 11% of the sample. The high-symptoms group experienced a larger decline from preinjury to postinjury than the no-symptoms group in verbal (P = .03) and visual memory (P = .05). However, total concussion-symptom scores increased from preinjury to postinjury for the no-symptoms group (P = .001) but remained stable for the high-symptoms group. Reported baseline symptoms may help identify athletes at risk for worse outcomes after SRC. Clinicians should examine baseline symptom levels to better identify patients for earlier referral and treatment for their

  7. Airflow Test of Acoustic Board Samples

    DEFF Research Database (Denmark)

    Jensen, Rasmus Lund; Jensen, Lise Mellergaard

    In the laboratory of Indoor Environmental Engineering, Department of Civil Engineering, Aalborg University an airflow test on 2x10 samples of acoustic board were carried out the 2nd of June 2012. The tests were carried out for Rambøll and STO AG. The test includes connected values of volume flow...

  8. Two independent pivotal statistics that test location and misspecification and add-up to the Anderson-Rubin statistic

    NARCIS (Netherlands)

    Kleibergen, F.R.

    2002-01-01

    We extend the novel pivotal statistics for testing the parameters in the instrumental variables regression model. We show that these statistics result from a decomposition of the Anderson-Rubin statistic into two independent pivotal statistics. The first statistic is a score statistic that tests

  9. From Test Scores to Language Use: Emergent Bilinguals Using English to Accomplish Academic Tasks

    Science.gov (United States)

    Rodriguez-Mojica, Claudia

    2018-01-01

    Prominent discourses about emergent bilinguals' academic abilities tend to focus on performance as measured by test scores and perpetuate the message that emergent bilinguals trail far behind their peers. When we remove the constraints of formal testing situations, what can emergent bilinguals do in English as they engage in naturally occurring…

  10. 46 CFR 160.050-5 - Sampling, tests, and inspection.

    Science.gov (United States)

    2010-10-01

    ... one from which any sample ring life buoy failed the buoyancy or strength test, the sample shall... ring life buoys with this subpart. The manufacturer shall provide means to secure any test that is not... procedures. Table 160.050-5(e)—Sampling for Buoyancy Tests Lot size Number of life buoys in sample 100 and...

  11. A Case Study About Why It Can Be Difficult To Test Whether Propensity Score Analysis Works in Field Experiments

    Directory of Open Access Journals (Sweden)

    Thomas D. Cook

    2012-01-01

    Full Text Available Peikes, Moreno and Orzol (2008 sensibly caution researchers that propensity score analysis may not lead to valid causal inference in field applications. But at the same time, they made the far stronger claim to have performed an ideal test of whether propensity score matching in quasi-experimental data is capable of approximating the results of a randomized experiment in their dataset, and that this ideal test showed that such matching could not do so. In this article we show that their study does not support that conclusion because it failed to meet a number of basic criteria for an ideal test. By implication, many other purported tests of the effectiveness of propensity score analysis probably also fail to meet these criteria, and are therefore questionable contributions to the literature on the effects of propensity score analysis.

  12. College students' drinking patterns: trajectories of AUDIT scores during the first four years at university.

    Science.gov (United States)

    Johnsson, Kent O; Leifman, Anders; Berglund, Mats

    2008-01-01

    Changes in AUDIT score trajectories were examined in a student population during their first 4 years at a university, including high-risk consumers and a subsample of low-risk consumers. 359 students were selected for the present study, comprising all high-risk consumers (the 27% with highest scores, i.e. 11 for males and 7 for females) and a randomized sample of low-risk consumers (n = 177 and 182, respectively). The Alcohol Use Disorder Identification Test (AUDIT) was used as screening instrument. Trajectory analyses were made using a semiparametric group-based model. In the low-AUDIT group, five distinct trajectories were identified: three stable non-risky consumption groups (83%) and two increasing groups (17%; from non-risky to risky). In the high-AUDIT group, three groups were identified: two stable high groups (58%) and one decreasing group (from risky to non-risky consumption; 41%). In the integrated model, stable risky consumption comprised 16% of the total sample, decreasing consumption 11%, increasing consumption comprised 13% and stable non-risky consumption 60% of the sample. Gender influenced the trajectories. The pattern of changes in risk consumption is similar to that found in corresponding US studies. (c) 2008 S. Karger AG, Basel

  13. Randomized Comparison of Two Vaginal Self-Sampling Methods for Human Papillomavirus Detection: Dry Swab versus FTA Cartridge

    OpenAIRE

    Catarino, Rosa; Vassilakos, Pierre; Bilancioni, Aline; Vanden Eynde, Mathieu; Meyer-Hamme, Ulrike; Menoud, Pierre-Alain; Guerry, Fr?d?ric; Petignat, Patrick

    2015-01-01

    Background Human papillomavirus (HPV) self-sampling (self-HPV) is valuable in cervical cancer screening. HPV testing is usually performed on physician-collected cervical smears stored in liquid-based medium. Dry filters and swabs are an alternative. We evaluated the adequacy of self-HPV using two dry storage and transport devices, the FTA cartridge and swab. Methods A total of 130 women performed two consecutive self-HPV samples. Randomization determined which of the two tests was performed f...

  14. The Disaggregation of Value-Added Test Scores to Assess Learning Outcomes in Economics Courses

    Science.gov (United States)

    Walstad, William B.; Wagner, Jamie

    2016-01-01

    This study disaggregates posttest, pretest, and value-added or difference scores in economics into four types of economic learning: positive, retained, negative, and zero. The types are derived from patterns of student responses to individual items on a multiple-choice test. The micro and macro data from the "Test of Understanding in College…

  15. Extension of the lod score: the mod score.

    Science.gov (United States)

    Clerget-Darpoux, F

    2001-01-01

    In 1955 Morton proposed the lod score method both for testing linkage between loci and for estimating the recombination fraction between them. If a disease is controlled by a gene at one of these loci, the lod score computation requires the prior specification of an underlying model that assigns the probabilities of genotypes from the observed phenotypes. To address the case of linkage studies for diseases with unknown mode of inheritance, we suggested (Clerget-Darpoux et al., 1986) extending the lod score function to a so-called mod score function. In this function, the variables are both the recombination fraction and the disease model parameters. Maximizing the mod score function over all these parameters amounts to maximizing the probability of marker data conditional on the disease status. Under the absence of linkage, the mod score conforms to a chi-square distribution, with extra degrees of freedom in comparison to the lod score function (MacLean et al., 1993). The mod score is asymptotically maximum for the true disease model (Clerget-Darpoux and Bonaïti-Pellié, 1992; Hodge and Elston, 1994). Consequently, the power to detect linkage through mod score will be highest when the space of models where the maximization is performed includes the true model. On the other hand, one must avoid overparametrization of the model space. For example, when the approach is applied to affected sibpairs, only two constrained disease model parameters should be used (Knapp et al., 1994) for the mod score maximization. It is also important to emphasize the existence of a strong correlation between the disease gene location and the disease model. Consequently, there is poor resolution of the location of the susceptibility locus when the disease model at this locus is unknown. Of course, this is true regardless of the statistics used. The mod score may also be applied in a candidate gene strategy to model the potential effect of this gene in the disease. Since, however, it

  16. The accuracy of Internet search engines to predict diagnoses from symptoms can be assessed with a validated scoring system.

    Science.gov (United States)

    Shenker, Bennett S

    2014-02-01

    To validate a scoring system that evaluates the ability of Internet search engines to correctly predict diagnoses when symptoms are used as search terms. We developed a five point scoring system to evaluate the diagnostic accuracy of Internet search engines. We identified twenty diagnoses common to a primary care setting to validate the scoring system. One investigator entered the symptoms for each diagnosis into three Internet search engines (Google, Bing, and Ask) and saved the first five webpages from each search. Other investigators reviewed the webpages and assigned a diagnostic accuracy score. They rescored a random sample of webpages two weeks later. To validate the five point scoring system, we calculated convergent validity and test-retest reliability using Kendall's W and Spearman's rho, respectively. We used the Kruskal-Wallis test to look for differences in accuracy scores for the three Internet search engines. A total of 600 webpages were reviewed. Kendall's W for the raters was 0.71 (psearch engines is a valid and reliable instrument. The scoring system may be used in future Internet research. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  17. The TSCA interagency testing committee`s approaches to screening and scoring chemicals and chemical groups: 1977-1983

    Energy Technology Data Exchange (ETDEWEB)

    Walker, J.D. [Environmental Protection Agency, Washington, DC (United States)

    1990-12-31

    This paper describes the TSCA interagency testing committee`s (ITC) approaches to screening and scoring chemicals and chemical groups between 1977 and 1983. During this time the ITC conducted five scoring exercises to select chemicals and chemical groups for detailed review and to determine which of these chemicals and chemical groups should be added to the TSCA Section 4(e) Priority Testing List. 29 refs., 1 fig., 2 tabs.

  18. Analysis of the Raven CPM Subtest Scores for a Sample of Gifted Children.

    Science.gov (United States)

    Kluever, Raymond C.; Green, Kathy E.

    The inter-subject/intra-subject subtest patterns (profiles) of the same sample of gifted children were examined based on factors found in a previous study of the Raven Coloured Progressive Matrices Test (CPM) that investigated structural properties with specific application to a sample of gifted children. The sample consisted of 166 children (78…

  19. Evaluation of the concomitant use of two different EIA tests for HIV screening in blood banks

    Directory of Open Access Journals (Sweden)

    Otani Marcia M.

    2003-01-01

    Full Text Available OBJECTIVE: In 1998, the Brazilian Ministry of Health made it mandatory for all blood banks in the country to screen donated blood for human immunodeficiency virus (HIV concomitantly using two different enzyme immunoassay (EIA tests. Concerned with the best use of available resources, our objective with this study was to evaluate the usefulness of conducting two EIA screening tests instead of just one. METHODS: We analyzed data from 1999 through 2001 obtained by testing 698 191 units of donated blood using two EIA HIV screening tests concomitantly at the Pro-Blood Foundation/Blood Center of São Paulo (Fundação Pró-Sangue/Hemocentro de São Paulo, which is a major blood center in the city of São Paulo, Brazil. All samples reactive in at least one of the two EIA tests were submitted for confirmation by a Western blot (WB test, and the persons who had donated those samples were also asked to return and provide a follow-up sample. RESULTS: Out of the 698 191 blood units that were donated, 2 718 of them (0.4% had to be discarded because they were reactive to at least one of the EIA tests. There were two WB-positive donation samples that were reactive in only one HIV EIA screening test. On their follow-up samples, both donors tested WB-negative. These cases were considered false positive results at screening. Of the 2 718 donors who were asked to return and provide a follow-up sample, 1 576 of them (58% did so. From these 1 576 persons, we found that there were two individuals who had been reactive to only one of the two EIA screening tests and who had also been negative on the WB at screening but who were fully seroconverted on the follow-up sample. We thus estimated that, in comparison to the use of a single EIA screening test, the use of two EIA screening tests would detect only one extra sample out of 410 700 units of blood. CONCLUSIONS: Our data do not support the use of two different, concomitant EIA screening tests for HIV. The great

  20. The Reliability and Validity of Weighted Composite Scores.

    Science.gov (United States)

    Kane, Michael; Case, Susan

    The scores on two distinct tests (e.g., essay and objective) are often combined into a composite score, which is used to make decisions. The validity of the observed composite can sometimes be evaluated relative to a separate criterion. In cases where no criterion is available, the observed composite has generally been evaluated in terms of its…

  1. Comparing the MMPI-2 Scale Scores of Parents Involved in Parental Competency and Child Custody Assessments

    Science.gov (United States)

    Resendes, John; Lecci, Len

    2012-01-01

    MMPI-2 scores from a parent competency sample (N = 136 parents) are compared with a previously published data set of MMPI-2 scores for child custody litigants (N = 508 parents; Bathurst et al., 1997). Independent samples t tests yielded significant and in some cases substantial differences on the standard MMPI-2 clinical scales (especially Scales…

  2. A Case Study About Why It Can Be Difficult To Test Whether Propensity Score Analysis Works in Field Experiments

    Directory of Open Access Journals (Sweden)

    William R. Shadish

    2013-02-01

    Full Text Available Peikes, Moreno and Orzol (2008 sensibly caution researchers that propensity score analysis may not lead to valid causal inference in field applications. But at the same time, they made the far stronger claim to have performed an ideal test of whether propensity score matching in quasi-experimental data is capable of approximating the results of a randomized experiment in their dataset, and that this ideal test showed that such matching could not do so. In this article we show that their study does not support that conclusion because it failed to meet a number of basic criteria for an ideal test. By implication, many other purported tests of the effectiveness of propensity score analysis probably also fail to meet these criteria, and are therefore questionable contributions to the literature on the effects of propensity score analysis. DOI: 10.2458/azu_jmmss.v3i2.16475

  3. [The diagnostic scores for deep venous thrombosis].

    Science.gov (United States)

    Junod, A

    2015-08-26

    Seven diagnostic scores for the deep venous thrombosis (DVT) of lower limbs are analyzed and compared. Two features make this exer- cise difficult: the problem of distal DVT and of their proximal extension and the status of patients, whether out- or in-patients. The most popular score is the Wells score (1997), modi- fied in 2003. It includes one subjective ele- ment based on clinical judgment. The Primary Care score 12005), less known, has similar pro- perties, but uses only objective data. The pre- sent trend is to associate clinical scores with the dosage of D-Dimers to rule out with a good sensitivity the probability of TVP. For the upper limb DVT, the Constans score (2008) is available, which can also be coupled with D-Dimers testing (Kleinjan).

  4. Evaluation of the validity of osteoporosis and fracture risk assessment tools (IOF One Minute Test, SCORE, and FRAX) in postmenopausal Palestinian women.

    Science.gov (United States)

    Kharroubi, Akram; Saba, Elias; Ghannam, Ibrahim; Darwish, Hisham

    2017-12-01

    The need for simple self-assessment tools is necessary to predict women at high risk for developing osteoporosis. In this study, tools like the IOF One Minute Test, Fracture Risk Assessment Tool (FRAX), and Simple Calculated Osteoporosis Risk Estimation (SCORE) were found to be valid for Palestinian women. The threshold for predicting women at risk for each tool was estimated. The purpose of this study is to evaluate the validity of the updated IOF (International Osteoporosis Foundation) One Minute Osteoporosis Risk Assessment Test, FRAX, SCORE as well as age alone to detect the risk of developing osteoporosis in postmenopausal Palestinian women. Three hundred eighty-two women 45 years and older were recruited including 131 women with osteoporosis and 251 controls following bone mineral density (BMD) measurement, 287 completed questionnaires of the different risk assessment tools. Receiver operating characteristic (ROC) curves were evaluated for each tool using bone BMD as the gold standard for osteoporosis. The area under the ROC curve (AUC) was the highest for FRAX calculated with BMD for predicting hip fractures (0.897) followed by FRAX for major fractures (0.826) with cut-off values ˃1.5 and ˃7.8%, respectively. The IOF One Minute Test AUC (0.629) was the lowest compared to other tested tools but with sufficient accuracy for predicting the risk of developing osteoporosis with a cut-off value ˃4 total yes questions out of 18. SCORE test and age alone were also as good predictors of risk for developing osteoporosis. According to the ROC curve for age, women ≥64 years had a higher risk of developing osteoporosis. Higher percentage of women with low BMD (T-score ≤-1.5) or osteoporosis (T-score ≤-2.5) was found among women who were not exposed to the sun, who had menopause before the age of 45 years, or had lower body mass index (BMI) compared to controls. Women who often fall had lower BMI and approximately 27% of the recruited postmenopausal

  5. The Alcohol Use Disorders Identification Scale (AUDIT) normative scores for a multiracial sample of Rhodes University residence students.

    Science.gov (United States)

    Young, Charles; Mayson, Tamara

    2010-06-01

    The objective of this research is to obtain accurate drinking norms for students living in the university residences in preparation for future social norms interventions that would allow individual students to compare their drinking to an appropriate reference group. Random cluster sampling was used to obtain data from 318 residence students who completed the Alcohol Use Disorders Identification Test (AUDIT), a brief, reliable and valid screening measure designed by the World Health Organisation (Babor et al. 2001). The Cronbach alpha coefficient of 0.83 reported for this multicultural sample is high, suggesting that the AUDIT may be reliably used in this and similar contexts. Normative scores are reported in the form of percentiles. Comparisons between the portions of students drinking safely and hazardously according to race and gender indicate that while male students are drinking no more hazardously than female students, white students drink far more hazardously than black students. These differences suggest that both race- and gender-specific norms would be essential for an effective social norms intervention in this multicultural South African context. Finally, the racialised drinking patterns might reflect an informal segregation of social space at Rhodes University.

  6. Are students' impressions of improved learning through active learning methods reflected by improved test scores?

    Science.gov (United States)

    Everly, Marcee C

    2013-02-01

    To report the transformation from lecture to more active learning methods in a maternity nursing course and to evaluate whether student perception of improved learning through active-learning methods is supported by improved test scores. The process of transforming a course into an active-learning model of teaching is described. A voluntary mid-semester survey for student acceptance of the new teaching method was conducted. Course examination results, from both a standardized exam and a cumulative final exam, among students who received lecture in the classroom and students who had active learning activities in the classroom were compared. Active learning activities were very acceptable to students. The majority of students reported learning more from having active-learning activities in the classroom rather than lecture-only and this belief was supported by improved test scores. Students who had active learning activities in the classroom scored significantly higher on a standardized assessment test than students who received lecture only. The findings support the use of student reflection to evaluate the effectiveness of active-learning methods and help validate the use of student reflection of improved learning in other research projects. Copyright © 2011 Elsevier Ltd. All rights reserved.

  7. Empirical Correlates of Low Scores on MMPI-2/MMPI-2-RF Restructured Clinical Scales in a Sample of University Students

    Science.gov (United States)

    Avdeyeva, Tatyana V.; Tellegen, Auke; Ben-Porath, Yossef S.

    2012-01-01

    In the present study, the authors explored the meaning of low scores on the MMPI-2/MMPI-2-RF Restructured Clinical (RC) scales. Using responses of a sample of university students (N = 811), the authors examined whether low (T less than 39), within-normal-limits (T = 39-64), and high (T greater than 65) score levels on the RC scales are…

  8. Two-Stage Variable Sample-Rate Conversion System

    Science.gov (United States)

    Tkacenko, Andre

    2009-01-01

    A two-stage variable sample-rate conversion (SRC) system has been pro posed as part of a digital signal-processing system in a digital com munication radio receiver that utilizes a variety of data rates. The proposed system would be used as an interface between (1) an analog- todigital converter used in the front end of the receiver to sample an intermediatefrequency signal at a fixed input rate and (2) digita lly implemented tracking loops in subsequent stages that operate at v arious sample rates that are generally lower than the input sample r ate. This Two-Stage System would be capable of converting from an input sample rate to a desired lower output sample rate that could be var iable and not necessarily a rational fraction of the input rate.

  9. Changes in Student Populations and Average Test Scores of Dutch Primary Schools

    Science.gov (United States)

    Luyten, Hans; de Wolf, Inge

    2011-01-01

    This article focuses on the relation between student population characteristics and average test scores per school in the final grade of primary education from a dynamic perspective. Aggregated data of over 5,000 Dutch primary schools covering a 6-year period were used to study the relation between changes in school populations and shifts in mean…

  10. [Performance of normal young adults in two temporal resolution tests].

    Science.gov (United States)

    Zaidan, Elena; Garcia, Adriana Pontin; Tedesco, Maria Lucy Fraga; Baran, Jane A

    2008-01-01

    temporal auditory processing is defined as the perception of sound or of sound alteration within a restricted time interval and is considered a fundamental ability for the auditory perception of verbal and non verbal sounds, for the perception of music, rhythm, periodicity and in the discrimination of pitch, duration and of phonemes. to compare the performance of normal Brazilian adults in two temporal resolution tests: the Gaps-in-Noise Test (GIN) and the Random Gap Detection Test (RGDT), and to analyze potential differences of performance in these two tests. twenty-five college students with normal hearing (11 males and 14 females) and no history of educational, neurological and/or language problems, underwent the GIN and RGDT at 40dB SL. statistically significant gender effects for both tests were found, with female participants showing poorer performance on both temporal processing tests. In addition, a comparative analysis of the results obtained in the GIN and RGDT revealed significant differences in the threshold measures derived for these two tests. In general, significantly better gap detection thresholds were observed for both male and female participants on the GIN test when compared to the results obtained for the RGDT. male participants presented better performances on both RGDT and GIN, when compared to the females. There were no differences in performance between right and left ears on the GIN test. Participants of the present investigation, males and females, performed better on the GIN when compared to the RGDT. The GIN presented advantages over the RGDT, not only in terms of clinical validity and sensibility, but also in terms of application and scoring.

  11. A comparison of two methods of logMAR visual acuity data scoring for statistical analysis

    Directory of Open Access Journals (Sweden)

    O. A. Oduntan

    2009-12-01

    Full Text Available The purpose of this study was to compare two methods of logMAR visual acuity (VA scoring. The two methods are referred to as letter scoring (method 1 and line scoring (method 2. The two methods were applied to VA data obtained from one hundred and forty (N=140 children with oculocutaneous albinism. Descriptive, correlation andregression statistics were then used to analyze the data.  Also, where applicable, the Bland and Altman analysis was used to compare sets of data from the two methods.  The right and left eyes data were included in the study, but because the findings were similar in both eyes, only the results for the right eyes are presented in this paper.  For method 1, the mean unaided VA (mean UAOD1 = 0.39 ±0.15 logMAR. The mean aided (mean ADOD1 VA = 0.50 ± 0.16 logMAR.  For method 2, the mean unaided (mean UAOD2 VA = 0.71 ± 0.15 logMAR, while the mean aided VA (mean ADOD2 = 0.60 ± 0.16 logMAR. The range and mean values of the improvement in VA for both methods were the same. The unaided VAs (UAOD1, UAOD2 and aided (ADOD1, ADOD2 for methods 1 and 2 correlated negatively (Unaided, r = –1, p<0.05, (Aided, r = –1, p<0.05.  The improvement in VA (differences between the unaided and aided VA values (DOD1 and DOD2 were positively correlated (r = +1, p <0.05. The Bland and Altman analyses showed that the VA improvement (unaided – aided VA values (DOD1 and DOD2 were similar for the two methods. Findings indicated that only the improvement in VA could be compared when different scoring methods are used. Therefore the scoring method used in any VA research project should be stated in the publication so that appropriate comparisons could be made by other researchers.

  12. The Impact of the 2004 Hurricanes on Florida Comprehensive Assessment Test Scores: Implications for School Counselors

    Science.gov (United States)

    Baggerly, Jennifer; Ferretti, Larissa K.

    2008-01-01

    What is the impact of natural disasters on students' statewide assessment scores? To answer this question, Florida Comprehensive Assessment Test (FCAT) scores of 55,881 students in grades 4 through 10 were analyzed to determine if there were significant decreases after the 2004 hurricanes. Results reveal that there was statistical but no practical…

  13. Randomized comparison of vaginal self-sampling by standard vs. dry swabs for Human papillomavirus testing

    International Nuclear Information System (INIS)

    Eperon, Isabelle; Vassilakos, Pierre; Navarria, Isabelle; Menoud, Pierre-Alain; Gauthier, Aude; Pache, Jean-Claude; Boulvain, Michel; Untiet, Sarah; Petignat, Patrick

    2013-01-01

    To evaluate if human papillomavirus (HPV) self-sampling (Self-HPV) using a dry vaginal swab is a valid alternative for HPV testing. Women attending colposcopy clinic were recruited to collect two consecutive Self-HPV samples: a Self-HPV using a dry swab (S-DRY) and a Self-HPV using a standard wet transport medium (S-WET). These samples were analyzed for HPV using real time PCR (Roche Cobas). Participants were randomized to determine the order of the tests. Questionnaires assessing preferences and acceptability for both tests were conducted. Subsequently, women were invited for colposcopic examination; a physician collected a cervical sample (physician-sampling) with a broom-type device and placed it into a liquid-based cytology medium. Specimens were then processed for the production of cytology slides and a Hybrid Capture HPV DNA test (Qiagen) was performed from the residual liquid. Biopsies were performed if indicated. Unweighted kappa statistics (κ) and McNemar tests were used to measure the agreement among the sampling methods. A total of 120 women were randomized. Overall HPV prevalence was 68.7% (95% Confidence Interval (CI) 59.3–77.2) by S-WET, 54.4% (95% CI 44.8–63.9) by S-DRY and 53.8% (95% CI 43.8–63.7) by HC. Among paired samples (S-WET and S-DRY), the overall agreement was good (85.7%; 95% CI 77.8–91.6) and the κ was substantial (0.70; 95% CI 0.57-0.70). The proportion of positive type-specific HPV agreement was also good (77.3%; 95% CI 68.2-84.9). No differences in sensitivity for cervical intraepithelial neoplasia grade one (CIN1) or worse between the two Self-HPV tests were observed. Women reported the two Self-HPV tests as highly acceptable. Self-HPV using dry swab transfer does not appear to compromise specimen integrity. Further study in a large screening population is needed. ClinicalTrials.gov: http://clinicaltrials.gov/show/NCT01316120

  14. Lower Quarter Y-Balance Test Scores and Lower Extremity Injury in NCAA Division I Athletes.

    Science.gov (United States)

    Lai, Wilson C; Wang, Dean; Chen, James B; Vail, Jeremy; Rugg, Caitlin M; Hame, Sharon L

    2017-08-01

    Functional movement tests that are predictive of injury risk in National Collegiate Athletic Association (NCAA) athletes are useful tools for sports medicine professionals. The Lower Quarter Y-Balance Test (YBT-LQ) measures single-leg balance and reach distances in 3 directions. To assess whether the YBT-LQ predicts the laterality and risk of sports-related lower extremity (LE) injury in NCAA athletes. Case-control study; Level of evidence, 3. The YBT-LQ was administered to 294 NCAA Division I athletes from 21 sports during preparticipation physical examinations at a single institution. Athletes were followed prospectively over the course of the corresponding season. Correlation analysis was performed between the laterality of reach asymmetry and composite scores (CS) versus the laterality of injury. Receiver operating characteristic (ROC) analysis was used to determine the optimal asymmetry cutoff score for YBT-LQ. A multivariate regression analysis adjusting for sex, sport type, body mass index, and history of prior LE surgery was performed to assess predictors of earlier and higher rates of injury. Neither the laterality of reach asymmetry nor the CS correlated with the laterality of injury. ROC analysis found optimal cutoff scores of 2, 9, and 3 cm for anterior, posteromedial, and posterolateral reach, respectively. All of these potential cutoff scores, along with a cutoff score of 4 cm used in the majority of prior studies, were associated with poor sensitivity and specificity. Furthermore, none of the asymmetric cutoff scores were associated with earlier or increased rate of injury in the multivariate analyses. YBT-LQ scores alone do not predict LE injury in this collegiate athlete population. Sports medicine professionals should be cautioned against using the YBT-LQ alone to screen for injury risk in collegiate athletes.

  15. Mixing and sampling tests for Radiochemical Plant

    International Nuclear Information System (INIS)

    Ehinger, M.N.; Marfin, H.R.; Hunt, B.

    1999-01-01

    The paper describes results and test procedures used to evaluate uncertainly and basis effects introduced by the sampler systems of a radiochemical plant, and similar parameters associated with mixing. This report will concentrate on experiences at the Barnwell Nuclear Fuels Plant. Mixing and sampling tests can be conducted to establish the statistical parameters for those activities related to overall measurement uncertainties. Density measurements by state-of-the art, commercially availability equipment is the key to conducting those tests. Experience in the U.S. suggests the statistical contribution of mixing and sampling can be controlled to less than 0.01 % and with new equipment and new tests in operating facilities might be controlled to better accuracy [ru

  16. Food variety, dietary diversity, and food characteristics among convenience samples of Guatemalan women.

    Science.gov (United States)

    Soto-Méndez, María José; Campos, Raquel; Hernández, Liza; Orozco, Mónica; Vossenaar, Marieke; Solomons, Noel W

    2011-01-01

    To compare variety and diversity patterns and dietary characteristics in Guatemalan women. Two non-consecutive 24-h recalls were conducted in convenience samples of 20 rural Mayan women and 20 urban students. Diversity scores were computed using three food-group systems.Variety and diversity scores and dietary origin and characteristics were compared between settings using independent t-test or Mann-Whitney-U-test. Dietary variety and diversity were generally greater in the urban sample when compared to the rural sample, depending on the number of days and food-group system used for evaluation.The diet was predominantly plant-based and composed of non-fortified food items in both areas.The rural diet was predominantly composed of traditional,non-processed foods. The urban diet was mostly based on non-traditional and processed items. Considerations of intervention strategies for dietary improvement and health protection for the Guatemalan countryside should still rely on promotion and preservation of traditional food selection.

  17. Development and validation of a composite scoring system for robot-assisted surgical training--the Robotic Skills Assessment Score.

    Science.gov (United States)

    Chowriappa, Ashirwad J; Shi, Yi; Raza, Syed Johar; Ahmed, Kamran; Stegemann, Andrew; Wilding, Gregory; Kaouk, Jihad; Peabody, James O; Menon, Mani; Hassett, James M; Kesavadas, Thenkurussi; Guru, Khurshid A

    2013-12-01

    A standardized scoring system does not exist in virtual reality-based assessment metrics to describe safe and crucial surgical skills in robot-assisted surgery. This study aims to develop an assessment score along with its construct validation. All subjects performed key tasks on previously validated Fundamental Skills of Robotic Surgery curriculum, which were recorded, and metrics were stored. After an expert consensus for the purpose of content validation (Delphi), critical safety determining procedural steps were identified from the Fundamental Skills of Robotic Surgery curriculum and a hierarchical task decomposition of multiple parameters using a variety of metrics was used to develop Robotic Skills Assessment Score (RSA-Score). Robotic Skills Assessment mainly focuses on safety in operative field, critical error, economy, bimanual dexterity, and time. Following, the RSA-Score was further evaluated for construct validation and feasibility. Spearman correlation tests performed between tasks using the RSA-Scores indicate no cross correlation. Wilcoxon rank sum tests were performed between the two groups. The proposed RSA-Score was evaluated on non-robotic surgeons (n = 15) and on expert-robotic surgeons (n = 12). The expert group demonstrated significantly better performance on all four tasks in comparison to the novice group. Validation of the RSA-Score in this study was carried out on the Robotic Surgical Simulator. The RSA-Score is a valid scoring system that could be incorporated in any virtual reality-based surgical simulator to achieve standardized assessment of fundamental surgical tents during robot-assisted surgery. Copyright © 2013 Elsevier Inc. All rights reserved.

  18. Depressive status explains a significant amount of the variance in COPD assessment test (CAT) scores.

    Science.gov (United States)

    Miravitlles, Marc; Molina, Jesús; Quintano, José Antonio; Campuzano, Anna; Pérez, Joselín; Roncero, Carlos

    2018-01-01

    COPD assessment test (CAT) is a short, easy-to-complete health status tool that has been incorporated into the multidimensional assessment of COPD in order to guide therapy; therefore, it is important to understand the factors determining CAT scores. This is a post hoc analysis of a cross-sectional, observational study conducted in respiratory medicine departments and primary care centers in Spain with the aim of identifying the factors determining CAT scores, focusing particularly on the cognitive status measured by the Mini-Mental State Examination (MMSE) and levels of depression measured by the short Beck Depression Inventory (BDI). A total of 684 COPD patients were analyzed; 84.1% were men, the mean age of patients was 68.7 years, and the mean forced expiratory volume in 1 second (%) was 55.1%. Mean CAT score was 21.8. CAT scores correlated with the MMSE score (Pearson's coefficient r =-0.371) and the BDI ( r =0.620), both p CAT scores and explained 45% of the variability. However, a model including only MMSE and BDI scores explained up to 40% and BDI alone explained 38% of the CAT variance. CAT scores are associated with clinical variables of severity of COPD. However, cognitive status and, in particular, the level of depression explain a larger percentage of the variance in the CAT scores than the usual COPD clinical severity variables.

  19. Bootstrap Score Tests for Fractional Integration in Heteroskedastic ARFIMA Models, with an Application to Price Dynamics in Commodity Spot and Futures Markets

    DEFF Research Database (Denmark)

    Cavaliere, Giuseppe; Nielsen, Morten Ørregaard; Taylor, A.M. Robert

    Empirical evidence from time series methods which assume the usual I(0)/I(1) paradigm suggests that the efficient market hypothesis, stating that spot and futures prices of a commodity should cointegrate with a unit slope on futures prices, does not hold. However, these statistical methods...... fractionally integrated model we are able to find a body of evidence in support of the efficient market hypothesis for a number of commodities. Our new tests are wild bootstrap implementations of score-based tests for the order of integration of a fractionally integrated time series. These tests are designed...... principle do. A Monte Carlo simulation study demonstrates that very significant improvements infinite sample behaviour can be obtained by the bootstrap vis-à-vis the corresponding asymptotic tests in both heteroskedastic and homoskedastic environments....

  20. The New Peabody Picture Vocabulary Test-III: An Illusion of Unbiased Assessment?

    Science.gov (United States)

    Stockman, Ida J

    2000-10-01

    This article examines whether changes in the ethnic minority composition of the standardization sample for the latest edition of the Peabody Picture Vocabulary Test (PPVT-III, Dunn & Dunn, 1997) can be used as the sole explanation for children's better test scores when compared to an earlier edition, the Peabody Picture Vocabulary Test-Revised (PPVT-R, Dunn & Dunn, 1981). Results from a comparative analysis of these two test editions suggest that other factors may explain improved performances. Among these factors are the number of words and age levels sampled, the types of words and pictures used, and characteristics of the standardization sample other than its ethnic minority composition. This analysis also raises questions regarding the usefulness of converting scores from one edition to the other and the type of criteria that could be used to evaluate whether the PPVT-III is an unbiased test of vocabulary for children from diverse cultural and linguistic backgrounds.

  1. Filtration and Leach Testing for REDOX Sludge and S-Saltcake Actual Waste Sample Composites

    Energy Technology Data Exchange (ETDEWEB)

    Shimskey, Rick W.; Billing, Justin M.; Buck, Edgar C.; Daniel, Richard C.; Draper, Kathryn E.; Edwards, Matthew K.; Geeting, John GH; Hallen, Richard T.; Jenson, Evan D.; Kozelisky, Anne E.; MacFarlan, Paul J.; Peterson, Reid A.; Snow, Lanee A.; Swoboda, Robert G.

    2009-02-20

    A testing program evaluating actual tank waste was developed in response to Task 4 from the M-12 External Flowsheet Review Team (EFRT) issue response plan.( ) The test program was subdivided into logical increments. The bulk water-insoluble solid wastes that are anticipated to be delivered to the Waste Treatment and Immobilization Plant (WTP) were identified according to type such that the actual waste testing could be targeted to the relevant categories. Under test plan TP-RPP-WTP-467, eight broad waste groupings were defined. Samples available from the 222S archive were identified and obtained for testing. Under this test plan, a waste-testing program was implemented that included: • Homogenizing the archive samples by group as defined in the test plan • Characterizing the homogenized sample groups • Performing parametric leaching testing on each group for compounds of interest • Performing bench-top filtration/leaching tests in the hot cell for each group to simulate filtration and leaching activities if they occurred in the UFP2 vessel of the WTP Pretreatment Facility. This report focuses on filtration/leaching tests performed on two of the eight waste composite samples and follow-on parametric tests to support aluminum leaching results from those tests.

  2. Basic distribution free identification tests for small size samples of environmental data

    International Nuclear Information System (INIS)

    Federico, A.G.; Musmeci, F.

    1998-01-01

    Testing two or more data sets for the hypothesis that they are sampled form the same population is often required in environmental data analysis. Typically the available samples have a small number of data and often then assumption of normal distributions is not realistic. On the other hand the diffusion of the days powerful Personal Computers opens new possible opportunities based on a massive use of the CPU resources. The paper reviews the problem introducing the feasibility of two non parametric approaches based on intrinsic equi probability properties of the data samples. The first one is based on a full re sampling while the second is based on a bootstrap approach. A easy to use program is presented. A case study is given based on the Chernobyl children contamination data [it

  3. 40 CFR 205.171-3 - Test motorcycle sample selection.

    Science.gov (United States)

    2010-07-01

    ... 40 Protection of Environment 24 2010-07-01 2010-07-01 false Test motorcycle sample selection. 205... ABATEMENT PROGRAMS TRANSPORTATION EQUIPMENT NOISE EMISSION CONTROLS Motorcycle Exhaust Systems § 205.171-3 Test motorcycle sample selection. A test motorcycle to be used for selective enforcement audit testing...

  4. Sequential Neighborhood Effects: The Effect of Long-Term Exposure to Concentrated Disadvantage on Children's Reading and Math Test Scores.

    Science.gov (United States)

    Hicks, Andrew L; Handcock, Mark S; Sastry, Narayan; Pebley, Anne R

    2018-02-01

    Prior research has suggested that children living in a disadvantaged neighborhood have lower achievement test scores, but these studies typically have not estimated causal effects that account for neighborhood choice. Recent studies used propensity score methods to account for the endogeneity of neighborhood exposures, comparing disadvantaged and nondisadvantaged neighborhoods. We develop an alternative propensity function approach in which cumulative neighborhood effects are modeled as a continuous treatment variable. This approach offers several advantages. We use our approach to examine the cumulative effects of neighborhood disadvantage on reading and math test scores in Los Angeles. Our substantive results indicate that recency of exposure to disadvantaged neighborhoods may be more important than average exposure for children's test scores. We conclude that studies of child development should consider both average cumulative neighborhood exposure and the timing of this exposure.

  5. Cytogenotoxicity screening of source water, wastewater and treated water of drinking water treatment plants using two in vivo test systems: Allium cepa root based and Nile tilapia erythrocyte based tests.

    Science.gov (United States)

    Hemachandra, Chamini K; Pathiratne, Asoka

    2017-01-01

    Biological effect directed in vivo tests with model organisms are useful in assessing potential health risks associated with chemical contaminations in surface waters. This study examined the applicability of two in vivo test systems viz. plant, Allium cepa root based tests and fish, Oreochromis niloticus erythrocyte based tests for screening cytogenotoxic potential of raw source water, water treatment waste (effluents) and treated water of drinking water treatment plants (DWTPs) using two DWTPs associated with a major river in Sri Lanka. Measured physico-chemical parameters of the raw water, effluents and treated water samples complied with the respective Sri Lankan standards. In the in vivo tests, raw water induced statistically significant root growth retardation, mitodepression and chromosomal abnormalities in the root meristem of the plant and micronuclei/nuclear buds evolution and genetic damage (as reflected by comet scores) in the erythrocytes of the fish compared to the aged tap water controls signifying greater genotoxicity of the source water especially in the dry period. The effluents provoked relatively high cytogenotoxic effects on both test systems but the toxicity in most cases was considerably reduced to the raw water level with the effluent dilution (1:8). In vivo tests indicated reduction of cytogenotoxic potential in the tested drinking water samples. The results support the potential applications of practically feasible in vivo biological test systems such as A. cepa root based tests and the fish erythrocyte based tests as complementary tools for screening cytogenotoxicity potential of the source water and water treatment waste reaching downstream of aquatic ecosystems and for evaluating cytogenotoxicity eliminating efficacy of the DWTPs in different seasons in view of human and ecological safety. Copyright © 2016 Elsevier Ltd. All rights reserved.

  6. Efficient Noninferiority Testing Procedures for Simultaneously Assessing Sensitivity and Specificity of Two Diagnostic Tests

    Directory of Open Access Journals (Sweden)

    Guogen Shan

    2015-01-01

    Full Text Available Sensitivity and specificity are often used to assess the performance of a diagnostic test with binary outcomes. Wald-type test statistics have been proposed for testing sensitivity and specificity individually. In the presence of a gold standard, simultaneous comparison between two diagnostic tests for noninferiority of sensitivity and specificity based on an asymptotic approach has been studied by Chen et al. (2003. However, the asymptotic approach may suffer from unsatisfactory type I error control as observed from many studies, especially in small to medium sample settings. In this paper, we compare three unconditional approaches for simultaneously testing sensitivity and specificity. They are approaches based on estimation, maximization, and a combination of estimation and maximization. Although the estimation approach does not guarantee type I error, it has satisfactory performance with regard to type I error control. The other two unconditional approaches are exact. The approach based on estimation and maximization is generally more powerful than the approach based on maximization.

  7. Collection and Characterization of Samples for Establishment of a Serum Repository for Lyme Disease Diagnostic Test Development and Evaluation

    Science.gov (United States)

    Molins, Claudia R.; Sexton, Christopher; Young, John W.; Ashton, Laura V.; Pappert, Ryan; Beard, Charles B.

    2014-01-01

    Serological assays and a two-tiered test algorithm are recommended for laboratory confirmation of Lyme disease. In the United States, the sensitivity of two-tiered testing using commercially available serology-based assays is dependent on the stage of infection and ranges from 30% in the early localized disease stage to near 100% in late-stage disease. Other variables, including subjectivity in reading Western blots, compliance with two-tiered recommendations, use of different first- and second-tier test combinations, and use of different test samples, all contribute to variation in two-tiered test performance. The availability and use of sample sets from well-characterized Lyme disease patients and controls are needed to better assess the performance of existing tests and for development of improved assays. To address this need, the Centers for Disease Control and Prevention and the National Institutes of Health prospectively collected sera from patients at all stages of Lyme disease, as well as healthy donors and patients with look-alike diseases. Patients and healthy controls were recruited using strict inclusion and exclusion criteria. Samples from all included patients were retrospectively characterized by two-tiered testing. The results from two-tiered testing corroborated the need for novel and improved diagnostics, particularly for laboratory diagnosis of earlier stages of infection. Furthermore, the two-tiered results provide a baseline with samples from well-characterized patients that can be used in comparing the sensitivity and specificity of novel diagnostics. Panels of sera and accompanying clinical and laboratory testing results are now available to Lyme disease serological test users and researchers developing novel tests. PMID:25122862

  8. Building a Scoring Model for Small and Medium Enterprises

    Directory of Open Access Journals (Sweden)

    Răzvan Constantin CARACOTA

    2010-09-01

    Full Text Available The purpose of the paper is to produce a scoring model for small and medium enterprises seeking financing through a bank loan. To analyze the loan application, scoring system developed for companies is as follows: scoring quantitative factors and scoring qualitative factors. We have estimated the probability of default using logistic regression. Regression coefficients determination was made with a solver in Excel using five ratios as input data. Analyses and simulations were conducted on a sample of 113 companies, all accepted for funding. Based on financial information obtained over two years, 2007 and 2008, we could establishe and appreciate the default value.

  9. Using College Admission Test Scores to Clarify High School Placement. Leading Indicator Spotlight

    Science.gov (United States)

    Flug, Susanna

    2010-01-01

    In "Beyond Test Scores: Leading Indicators for Education," Foley and colleagues (2008) define leading indicators as those that "provide early signals of progress toward academic achievement" (p. 1) and stress that educators "need leading indicators to help them see the direction their efforts are going in and to take…

  10. Preliminary testing of the reliability and feasibility of SAGE: a system to measure and score engagement with and use of research in health policies and programs.

    Science.gov (United States)

    Makkar, Steve R; Williamson, Anna; D'Este, Catherine; Redman, Sally

    2017-12-19

    Few measures of research use in health policymaking are available, and the reliability of such measures has yet to be evaluated. A new measure called the Staff Assessment of Engagement with Evidence (SAGE) incorporates an interview that explores policymakers' research use within discrete policy documents and a scoring tool that quantifies the extent of policymakers' research use based on the interview transcript and analysis of the policy document itself. We aimed to conduct a preliminary investigation of the usability, sensitivity, and reliability of the scoring tool in measuring research use by policymakers. Nine experts in health policy research and two independent coders were recruited. Each expert used the scoring tool to rate a random selection of 20 interview transcripts, and each independent coder rated 60 transcripts. The distribution of scores among experts was examined, and then, interrater reliability was tested within and between the experts and independent coders. Average- and single-measure reliability coefficients were computed for each SAGE subscales. Experts' scores ranged from the limited to extensive scoring bracket for all subscales. Experts as a group also exhibited at least a fair level of interrater agreement across all subscales. Single-measure reliability was at least fair except for three subscales: Relevance Appraisal, Conceptual Use, and Instrumental Use. Average- and single-measure reliability among independent coders was good to excellent for all subscales. Finally, reliability between experts and independent coders was fair to excellent for all subscales. Among experts, the scoring tool was comprehensible, usable, and sensitive to discriminate between documents with varying degrees of research use. Secondly, the scoring tool yielded scores with good reliability among the independent coders. There was greater variability among experts, although as a group, the tool was fairly reliable. The alignment between experts' and independent

  11. Reproducibility of subgingival bacterial samples from patients with peri-implant mucositis

    DEFF Research Database (Denmark)

    Hallström, Hadar; Persson, G Rutger; Strömberg, Ulf

    2015-01-01

    collected with paper points and analyzed using the checkerboard DNA-DNA hybridization technique. Whole genomic probes of 74 preselected bacterial species were used. Based on the bacterial scores, Cohen's kappa coefficient was used to calculate the inter-annotator agreement for categorical data......OBJECTIVE: The aim of the present study was to investigate the reproducibility of bacterial enumeration from subsequent subgingival samples collected from patients with peri-implant mucositis. MATERIAL AND METHODS: Duplicate microbial samples from 222 unique implant sites in 45 adult subjects were....... The percentage agreement was considered as "good" when the two samples showed the same score or differed by 1 to the power of 10. RESULTS: Moderate to fair kappa values were displayed for all bacterial species in the test panel (range 0.21-0.58). There were no significant differences between Gram...

  12. Differences of wells scores accuracy, caprini scores and padua scores in deep vein thrombosis diagnosis

    Science.gov (United States)

    Gatot, D.; Mardia, A. I.

    2018-03-01

    Deep Vein Thrombosis (DVT) is the venous thrombus in lower limbs. Diagnosis is by using venography or ultrasound compression. However, these examinations are not available yet in some health facilities. Therefore many scoring systems are developed for the diagnosis of DVT. The scoring method is practical and safe to use in addition to efficacy, and effectiveness in terms of treatment and costs. The existing scoring systems are wells, caprini and padua score. There have been many studies comparing the accuracy of this score but not in Medan. Therefore, we are interested in comparative research of wells, capriniand padua score in Medan.An observational, analytical, case-control study was conducted to perform diagnostic tests on the wells, caprini and padua score to predict the risk of DVT. The study was at H. Adam Malik Hospital in Medan.From a total of 72 subjects, 39 people (54.2%) are men and the mean age are 53.14 years. Wells score, caprini score and padua score has a sensitivity of 80.6%; 61.1%, 50% respectively; specificity of 80.65; 66.7%; 75% respectively, and accuracy of 87.5%; 64.3%; 65.7% respectively.Wells score has better sensitivity, specificity and accuracy than caprini and padua score in diagnosing DVT.

  13. Test-retest reliability and predictive validity of the Implicit Association Test in children.

    Science.gov (United States)

    Rae, James R; Olson, Kristina R

    2018-02-01

    The Implicit Association Test (IAT) is increasingly used in developmental research despite minimal evidence of whether children's IAT scores are reliable across time or predictive of behavior. When test-retest reliability and predictive validity have been assessed, the results have been mixed, and because these studies have differed on many factors simultaneously (lag-time between testing administrations, domain, etc.), it is difficult to discern what factors may explain variability in existing test-retest reliability and predictive validity estimates. Across five studies (total N = 519; ages 6- to 11-years-old), we manipulated two factors that have varied in previous developmental research-lag-time and domain. An internal meta-analysis of these studies revealed that, across three different methods of analyzing the data, mean test-retest (rs of .48, .38, and .34) and predictive validity (rs of .46, .20, and .10) effect sizes were significantly greater than zero. While lag-time did not moderate the magnitude of test-retest coefficients, whether we observed domain differences in test-retest reliability and predictive validity estimates was contingent on other factors, such as how we scored the IAT or whether we included estimates from a unique sample (i.e., a sample containing gender typical and gender diverse children). Recommendations are made for developmental researchers that utilize the IAT in their research. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  14. Fission product behavior during the first two PBF severe fuel damage tests

    International Nuclear Information System (INIS)

    Osetek, D.J.; Cronenberg, A.W.; Hobbins, R.R.; Vinjamuri, K.

    1984-01-01

    The results of the first two severe fuel damage tests performed in the Power Burst Facility are assessed in terms of fission product release and chemical behavior. On-line gamma spectroscopy and grab sample data indicate limited release during solid-phase fuel heatup. Analysis indicates that the fuel morphology conditions for the trace-irradiated fuel employed in these two tests limit initial release. Only upon high temperature fuel restructuring and liquefaction is significant release indicated. Chemical equilibrium predictions, based on steam oxidation or reduction conditions, indicate I to be the primary iodine species during trnsport in the steam environment of the first test and CsI to be the primary species during transport in the hydrogen environment of the second test. However, the higher steam flow rate conditions of the first test transported the released iodine through the sample system; whereas, low-hydrogen flow rate of the second test apparently allowed the vast majority of iodine-bearing compounds to plateout during transport

  15. Test plan for core sampling drill bit temperature monitor

    International Nuclear Information System (INIS)

    Francis, P.M.

    1994-01-01

    At WHC, one of the functions of the Tank Waste Remediation System division is sampling waste tanks to characterize their contents. The push-mode core sampling truck is currently used to take samples of liquid and sludge. Sampling of tanks containing hard salt cake is to be performed with the rotary-mode core sampling system, consisting of the core sample truck, mobile exhauster unit, and ancillary subsystems. When drilling through the salt cake material, friction and heat can be generated in the drill bit. Based upon tank safety reviews, it has been determined that the drill bit temperature must not exceed 180 C, due to the potential reactivity of tank contents at this temperature. Consequently, a drill bit temperature limit of 150 C was established for operation of the core sample truck to have an adequate margin of safety. Unpredictable factors, such as localized heating, cause this buffer to be so great. The most desirable safeguard against exceeding this threshold is bit temperature monitoring . This document describes the recommended plan for testing the prototype of a drill bit temperature monitor developed for core sampling by Sandia National Labs. The device will be tested at their facilities. This test plan documents the tests that Westinghouse Hanford Company considers necessary for effective testing of the system

  16. A high COPD assessment test score may predict anxiety in COPD

    Directory of Open Access Journals (Sweden)

    Harryanto H

    2018-03-01

    Full Text Available Hilman Harryanto,1 Sally Burrows,2 Yuben Moodley1,2 1Department of Respiratory Medicine, Fiona Stanley Hospital, Perth, WA, Australia; 2Faculty of Health and Medical Sciences, Medical School, University of Western Australia, Perth, WA, AustraliaThe prevalence of anxiety is 55% in patients with COPD,1 and it is associated with worse disease control. Therefore, early recognition and institution of treatment of this comorbidity significantly improve patient’s quality of life. Recently, a questionnaire called the COPD assessment test (CAT has been incorporated into the Global Initiative for Chronic Obstructive Lung Disease (GOLD guidelines for the management of COPD, and a higher score is associated with increased COPD symptoms.2 Considering the regular use of CAT, it was evaluated whether this tool can also be used to identify anxiety. The CAT score was correlated with the Hospital Anxiety and Depression Scale (HADS to determine the level at which CAT may predict anxiety.

  17. Estimation of sample size and testing power (part 5).

    Science.gov (United States)

    Hu, Liang-ping; Bao, Xiao-lei; Guan, Xue; Zhou, Shi-guo

    2012-02-01

    Estimation of sample size and testing power is an important component of research design. This article introduced methods for sample size and testing power estimation of difference test for quantitative and qualitative data with the single-group design, the paired design or the crossover design. To be specific, this article introduced formulas for sample size and testing power estimation of difference test for quantitative and qualitative data with the above three designs, the realization based on the formulas and the POWER procedure of SAS software and elaborated it with examples, which will benefit researchers for implementing the repetition principle.

  18. Bayley-III: Cultural differences and language scale validity in a Danish sample.

    Science.gov (United States)

    Krogh, Marianne T; Vaever, Mette S

    2016-12-01

    The purpose of this study was to investigate cultural differences between Danish and American children at 2 and 3 years as measured with the developmental test Bayley-III, and to investigate the Bayley-III Language Scale validity. The Danish children (N = 43) were tested with the Bayley-III and their parents completed an additional language questionnaire (the MacArthur-Bates CDI). Results showed that scores from the Danish children did not differ significantly from the American norms on the Cognitive or Motor Scale, but the Danish sample scored significantly higher on the Language Scale. A comparison of the Bayley-III Language subtests with the CDI showed that the two measures correlated significantly, but the percentile score from the CDI was significantly higher than the percentile score from the Bayley-III Language subtests. This could be because the two instruments measure slightly different areas of language development, or because the Bayley-III overestimates language development in Danish children. However, due to the limitations of the current study, further research is needed to clarify this issue. © 2016 Scandinavian Psychological Associations and John Wiley & Sons Ltd.

  19. Comprehensive School Reform and Standardized Test Scores in Illinois Elementary and Middle Schools

    Science.gov (United States)

    McEnroe, James D.

    2010-01-01

    The study examined the effects of the federally funded Comprehensive School Reform (CSR) program on student performance on mandated standardized tests. The study focused on the mathematics and reading scores of Illinois public elementary and middle and junior high school students. The federal CSR program provided Illinois schools with an annual…

  20. Combination of scoring schemes for protein docking

    Directory of Open Access Journals (Sweden)

    Schomburg Dietmar

    2007-08-01

    Full Text Available Abstract Background Docking algorithms are developed to predict in which orientation two proteins are likely to bind under natural conditions. The currently used methods usually consist of a sampling step followed by a scoring step. We developed a weighted geometric correlation based on optimised atom specific weighting factors and combined them with our previously published amino acid specific scoring and with a comprehensive SVM-based scoring function. Results The scoring with the atom specific weighting factors yields better results than the amino acid specific scoring. In combination with SVM-based scoring functions the percentage of complexes for which a near native structure can be predicted within the top 100 ranks increased from 14% with the geometric scoring to 54% with the combination of all scoring functions. Especially for the enzyme-inhibitor complexes the results of the ranking are excellent. For half of these complexes a near-native structure can be predicted within the first 10 proposed structures and for more than 86% of all enzyme-inhibitor complexes within the first 50 predicted structures. Conclusion We were able to develop a combination of different scoring schemes which considers a series of previously described and some new scoring criteria yielding a remarkable improvement of prediction quality.

  1. Testing of Small Graphite Samples for Nuclear Qualification

    Energy Technology Data Exchange (ETDEWEB)

    Julie Chapman

    2010-11-01

    Accurately determining the mechanical properties of small irradiated samples is crucial to predicting the behavior of the overal irradiated graphite components within a Very High Temperature Reactor. The sample size allowed in a material test reactor, however, is limited, and this poses some difficulties with respect to mechanical testing. In the case of graphite with a larger grain size, a small sample may exhibit characteristics not representative of the bulk material, leading to inaccuracies in the data. A study to determine a potential size effect on the tensile strength was pursued under the Next Generation Nuclear Plant program. It focuses first on optimizing the tensile testing procedure identified in the American Society for Testing and Materials (ASTM) Standard C 781-08. Once the testing procedure was verified, a size effect was assessed by gradually reducing the diameter of the specimens. By monitoring the material response, a size effect was successfully identified.

  2. Myocardial perfusion imaging and coronary calcium scoring with a two-slice SPECT/CT system: can the attenuation map be calculated from the calcium scoring CT scan?

    Energy Technology Data Exchange (ETDEWEB)

    Wenning, Christian; Rahbar, Kambiz; Schober, Otmar; Stegger, Lars [University of Muenster, Department of Nuclear Medicine, Muenster (Germany); Vrachimis, Alexis; Schaefers, Michael [University of Muenster, Department of Nuclear Medicine, Muenster (Germany); University of Muenster, European Institute for Molecular Imaging, Muenster (Germany)

    2013-07-15

    Coronary artery calcium scoring can complement myocardial perfusion imaging (MPI). The purpose of this study was to evaluate the feasibility and accuracy of using the CalciumScore-CT derived from a combined SPECT/CT device also for SPECT attenuation correction (AC). The study group comprised 99 patients who underwent both post-stress and rest MPI using a two-slice SPECT/CT system. For AC, one of the two scans was accompanied by a CalciumScore-CT scan (CalciumScore-CTAC) and the other by a conventional spiral CT (AttenCorr-CT) scan (AttenCorr-CTAC). In 48 patients the CalciumScore-CT scan was acquired with the post-stress scan and the AttenCorr-CT scan with the rest scan, and in 51 patients the order was reversed. The accuracy of the images based on AC was determined qualitatively by consensus reading with respect to the clinical diagnoses as well as quantitatively by comparing the perfusion summed stress scores (SSS) and the summed rest scores (SRS) between attenuation-corrected and uncorrected images. In comparison to the uncorrected images CalciumScore-CTAC led to regional inaccuracies in 14 of 51 of studies (27.5 %) versus 12 of 48 studies (25 %) with AttenCorr-CTAC for the stress studies and in 5 of 48 (10 %) versus 1 of 51 (2 %) for the rest studies, respectively. This led to intermediate and definite changes in the final diagnosis (ischaemia and/or scarring) in 12 % of the studies (12 of 99) and in 7 % of the studies (7 of 99) with CalciumScore-CTAC and in 9 % of the studies (9 of 99) and 4 % of the studies (4 of 99) with AttenCorr-CTAC. Differences in SSS and SRS with respect to the uncorrected images were greater for the CalciumScore-CTAC images than for the AttenCorr-CTAC images ({Delta}SSS 4.5 {+-} 5.6 and 2.1 {+-} 4.4, p = 0.023; {Delta}SRS 4.2 {+-} 4.9 and 1.6 {+-} 3.2, p = 0.004, respectively). Using the same CT scan for calcium scoring and SPECT AC is feasible. Image interpretation must, however, include uncorrected images since CT-based AC relatively

  3. Clinical evaluation of human papillomavirus detection by careHPV™ test on physician-samples and self-samples using the indicating FTA Elute® card.

    Science.gov (United States)

    Wang, Shao-Ming; Hu, Shang-Ying; Chen, Feng; Chen, Wen; Zhao, Fang-Hui; Zhang, Yu-Qing; Ma, Xin-Ming; Qiao, You-Lin

    2014-01-01

    To make the clinical evaluation of a solid-state human papillomavirus (HPV) sampling medium in combination with an economical HPV testing method (careHPV™) for cervical cancer screening. 396 women aged 25-65 years were enrolled for cervical cancer screening, and four samples were collected. Two samples were collected by woman themselves, among which one was stored in DCM preservative solution (called "liquid sample") and the other was applied on the Whatman Indicating FTA Elute® card (FTA card). Another two samples were collected by physician and stored in DCM preservative solution and FTA card, respectively. All the samples were detected by careHPV™ test. All the women were administered a colposcopy examination, and biopsies were taken for pathological confirmation if necessary. FTA card demonstrated a comparable sensitivity of detecting high grade Cervical Intraepithelial Neoplasia (CIN) with the liquid sample carrier for self and physician-sampling, but showed a higher specificity than that of liquid sample carrier for self-sampling (FTA vs Liquid: 79.0% vs 71.6%, p=0.02). Generally, the FTA card had a comparable accuracy with that of Liquid-based medium by different sampling operators, with an area under the curve of 0.807 for physician and FTA, 0.781 for physician and Liquid, 0.728 for self and FTA, and 0.733 for self and Liquid (p>0.05). FTA card is a promising sample carrier for cervical cancer screening. With appropriate education programmes and further optimization of the experimental workflow, FTA card based self-collection in combination with centralized careHPV™ testing can help expand the coverage of cervical cancer screening in low-resource areas.

  4. The power to detect linkage in complex disease by means of simple LOD-score analyses.

    Science.gov (United States)

    Greenberg, D A; Abreu, P; Hodge, S E

    1998-09-01

    Maximum-likelihood analysis (via LOD score) provides the most powerful method for finding linkage when the mode of inheritance (MOI) is known. However, because one must assume an MOI, the application of LOD-score analysis to complex disease has been questioned. Although it is known that one can legitimately maximize the maximum LOD score with respect to genetic parameters, this approach raises three concerns: (1) multiple testing, (2) effect on power to detect linkage, and (3) adequacy of the approximate MOI for the true MOI. We evaluated the power of LOD scores to detect linkage when the true MOI was complex but a LOD score analysis assumed simple models. We simulated data from 14 different genetic models, including dominant and recessive at high (80%) and low (20%) penetrances, intermediate models, and several additive two-locus models. We calculated LOD scores by assuming two simple models, dominant and recessive, each with 50% penetrance, then took the higher of the two LOD scores as the raw test statistic and corrected for multiple tests. We call this test statistic "MMLS-C." We found that the ELODs for MMLS-C are >=80% of the ELOD under the true model when the ELOD for the true model is >=3. Similarly, the power to reach a given LOD score was usually >=80% that of the true model, when the power under the true model was >=60%. These results underscore that a critical factor in LOD-score analysis is the MOI at the linked locus, not that of the disease or trait per se. Thus, a limited set of simple genetic models in LOD-score analysis can work well in testing for linkage.

  5. Propensity-score matching in the cardiovascular surgery literature from 2004 to 2006: a systematic review and suggestions for improvement.

    Science.gov (United States)

    Austin, Peter C

    2007-11-01

    I conducted a systematic review of the use of propensity score matching in the cardiovascular surgery literature. I examined the adequacy of reporting and whether appropriate statistical methods were used. I examined 60 articles published in the Annals of Thoracic Surgery, European Journal of Cardio-thoracic Surgery, Journal of Cardiovascular Surgery, and the Journal of Thoracic and Cardiovascular Surgery between January 1, 2004, and December 31, 2006. Thirty-one of the 60 studies did not provide adequate information on how the propensity score-matched pairs were formed. Eleven (18%) of studies did not report on whether matching on the propensity score balanced baseline characteristics between treated and untreated subjects in the matched sample. No studies used appropriate methods to compare baseline characteristics between treated and untreated subjects in the propensity score-matched sample. Eight (13%) of the 60 studies explicitly used statistical methods appropriate for the analysis of matched data when estimating the effect of treatment on the outcomes. Two studies used appropriate methods for some outcomes, but not for all outcomes. Thirty-nine (65%) studies explicitly used statistical methods that were inappropriate for matched-pairs data when estimating the effect of treatment on outcomes. Eleven studies did not report the statistical tests that were used to assess the statistical significance of the treatment effect. Analysis of propensity score-matched samples tended to be poor in the cardiovascular surgery literature. Most statistical analyses ignored the matched nature of the sample. I provide suggestions for improving the reporting and analysis of studies that use propensity score matching.

  6. Assessment of coronary calcification using calibrated mass score with two different multidetector computed tomography scanners in the Copenhagen General Population Study

    Energy Technology Data Exchange (ETDEWEB)

    Fuchs, Andreas [Department of Cardiology, The Heart Centre, Rigshospitalet, University of Copenhagen, Copenhagen (Denmark); Groen, Jaap M. [Department of Radiology, University of Groningen, University Medical Center Groningen (Netherlands); Department of Medical Physics, OLVG, Amsterdam (Netherlands); Arnold, Ben A. [Image Analysis, 1380 Burkesville Road, Columbia, KY (United States); Nikolovski, Sasho [Department of Radiology, University of Groningen, University Medical Center Groningen (Netherlands); Knudsen, Andreas D., E-mail: dehlbaek@gmail.com [Department of Cardiology, The Heart Centre, Rigshospitalet, University of Copenhagen, Copenhagen (Denmark); Kühl, J. Tobias [Department of Cardiology, The Heart Centre, Rigshospitalet, University of Copenhagen, Copenhagen (Denmark); Nordestgaard, Børge G. [Department of Clinical Biochemistry and the Copenhagen General Population Study, Herlev Hospital, University of Copenhagen (Denmark); Greuter, Marcel J.W. [Department of Radiology, University of Groningen, University Medical Center Groningen (Netherlands); Kofoed, Klaus F. [Department of Cardiology, The Heart Centre, Rigshospitalet, University of Copenhagen, Copenhagen (Denmark); Department of Radiology, The Diagnostic Centre, Rigshospitalet, University of Copenhagen, Copenhagen (Denmark)

    2017-03-15

    Objective: Population studies have shown coronary calcium score to improve risk stratification in subjects suspected for cardiovascular disease. The aim of this work was to assess the validity of multidetector computed tomography (MDCT) for measurement of calibrated mass scores (MS) in a phantom study, and to investigate inter-scanner variability for MS and Agaston score (AS) recorded in a population study on two different high-end MDCT scanners. Materials and methods: A calcium phantom was scanned by a first (A) and second (B) generation 320-MDCT. MS was measured for each calcium deposit from repeated measurements in each scanner and compared to known physical phantom mass. Random samples of human subjects from the Copenhagen General Population Study were scanned with scanner A (N = 254) and scanner B (N = 253) where MS and AS distributions of these two groups were compared. Results: The mean total MS of the phantom was 32.9 ± 0.8 mg and 33.1 ± 0.9 mg (p = 0.43) assessed by scanner A and B respectively – the physical calcium mass was 34.0 mg. Correlation between measured MS and physical calcium mass was R{sup 2} = 0.99 in both scanners. In the population study the median total MS was 16.8 mg (interquartile range (IQR): 3.5–81.1) and 15.8 mg (IQR: 3.8–63.4) in scanner A and B (p = 0.88). The corresponding median total AS were 92 (IQR: 23–471) and 89 (IQR: 40–384) (p = 0.64). Conclusion: Calibrated calcium mass score may be assessed with very high accuracy in a calcium phantom by different generations of 320-MDCT scanners. In population studies, it appears acceptable to pool calcium scores acquired on different 320-MDCT scanners.

  7. Specific algorithm method of scoring the Clock Drawing Test applied in cognitively normal elderly

    Directory of Open Access Journals (Sweden)

    Liana Chaves Mendes-Santos

    Full Text Available The Clock Drawing Test (CDT is an inexpensive, fast and easily administered measure of cognitive function, especially in the elderly. This instrument is a popular clinical tool widely used in screening for cognitive disorders and dementia. The CDT can be applied in different ways and scoring procedures also vary. OBJECTIVE: The aims of this study were to analyze the performance of elderly on the CDT and evaluate inter-rater reliability of the CDT scored by using a specific algorithm method adapted from Sunderland et al. (1989. METHODS: We analyzed the CDT of 100 cognitively normal elderly aged 60 years or older. The CDT ("free-drawn" and Mini-Mental State Examination (MMSE were administered to all participants. Six independent examiners scored the CDT of 30 participants to evaluate inter-rater reliability. RESULTS AND CONCLUSION: A score of 5 on the proposed algorithm ("Numbers in reverse order or concentrated", equivalent to 5 points on the original Sunderland scale, was the most frequent (53.5%. The CDT specific algorithm method used had high inter-rater reliability (p<0.01, and mean score ranged from 5.06 to 5.96. The high frequency of an overall score of 5 points may suggest the need to create more nuanced evaluation criteria, which are sensitive to differences in levels of impairment in visuoconstructive and executive abilities during aging.

  8. Exploration of the (Interrater) Reliability and Latent Factor Structure of the Alcohol Use Disorders Identification Test (AUDIT) and the Drug Use Disorders Identification Test (DUDIT) in a Sample of Dutch Probationers.

    Science.gov (United States)

    Hildebrand, Martin; Noteborn, Mirthe G C

    2015-01-01

    The use of brief, reliable, valid, and practical measures of substance use is critical for conducting individual (risk and need) assessments in probation practice. In this exploratory study, the basic psychometric properties of the Alcohol Use Disorders Identification Test (AUDIT) and the Drug Use Disorders Identification Test (DUDIT) are evaluated. The instruments were administered as an oral interview instead of a self-report questionnaire. The sample comprised 383 offenders (339 men, 44 women). A subset of 56 offenders (49 men, 7 women) participated in the interrater reliability study. Data collection took place between September 2011 and November 2012. Overall, both instruments have acceptable levels of interrater reliability for total scores and acceptable to good interrater reliabilities for most of the individual items. Confirmatory factor analyses (CFA) indicated that the a priori one-, two- and three-factor solutions for the AUDIT did not fit the observed data very well. Principal axis factoring (PAF) supported a two-factor solution for the AUDIT that included a level of alcohol consumption/consequences factor (Factor 1) and a dependence factor (Factor 2), with both factors explaining substantial variance in AUDIT scores. For the DUDIT, CFA and PAF suggest that a one-factor solution is the preferred model (accounting for 62.61% of total variance). The Dutch language versions of the AUDIT and the DUDIT are reliable screening instruments for use with probationers and both instruments can be reliably administered by probation officers in probation practice. However, future research on concurrent and predictive validity is warranted.

  9. Acceptance sampling for attributes via hypothesis testing and the hypergeometric distribution

    Science.gov (United States)

    Samohyl, Robert Wayne

    2017-10-01

    This paper questions some aspects of attribute acceptance sampling in light of the original concepts of hypothesis testing from Neyman and Pearson (NP). Attribute acceptance sampling in industry, as developed by Dodge and Romig (DR), generally follows the international standards of ISO 2859, and similarly the Brazilian standards NBR 5425 to NBR 5427 and the United States Standards ANSI/ASQC Z1.4. The paper evaluates and extends the area of acceptance sampling in two directions. First, by suggesting the use of the hypergeometric distribution to calculate the parameters of sampling plans avoiding the unnecessary use of approximations such as the binomial or Poisson distributions. We show that, under usual conditions, discrepancies can be large. The conclusion is that the hypergeometric distribution, ubiquitously available in commonly used software, is more appropriate than other distributions for acceptance sampling. Second, and more importantly, we elaborate the theory of acceptance sampling in terms of hypothesis testing rigorously following the original concepts of NP. By offering a common theoretical structure, hypothesis testing from NP can produce a better understanding of applications even beyond the usual areas of industry and commerce such as public health and political polling. With the new procedures, both sample size and sample error can be reduced. What is unclear in traditional acceptance sampling is the necessity of linking the acceptable quality limit (AQL) exclusively to the producer and the lot quality percent defective (LTPD) exclusively to the consumer. In reality, the consumer should also be preoccupied with a value of AQL, as should the producer with LTPD. Furthermore, we can also question why type I error is always uniquely associated with the producer as producer risk, and likewise, the same question arises with consumer risk which is necessarily associated with type II error. The resolution of these questions is new to the literature. The

  10. Diagnosing unilateral primary aldosteronism - comparison of a clinical prediction score, computed tomography and adrenal venous sampling.

    Science.gov (United States)

    Sze, W C Candy; Soh, Lip Min; Lau, Jeshen H; Reznek, Rodney; Sahdev, Anju; Matson, Matthew; Riddoch, Fiona; Carpenter, Robert; Berney, Dan; Grossman, Ashley B; Chew, Shern L; Akker, Scott A; Druce, Maralyn R; Waterhouse, Mona; Monson, John P; Drake, William M

    2014-07-01

    In patients with primary aldosteronism (PA), adrenalectomy is potentially curative for those correctly identified as having unilateral excessive aldosterone production. It has been suggested that a recently developed and published clinical prediction score (CPS) may correctly identify some patients as having unilateral disease, without recourse to adrenal venous sampling. We have applied the CPS to a large cohort of PA patients with defined and documented outcomes. We also incorporated a minor modification to the CPS and a radiological grading score (RGS) into our analysis to assess whether its performance could be augmented. A total of 75 patients with a robust diagnosis following bilateral adrenal venous cannulation and/or strictly defined surgical outcome were analysed. Applying the CPS to this group of patients produced a sensitivity of 38·8% and a specificity of 88·5% of correctly identifying unilateral aldosterone production. Using a suggested modification to the CPS, in which different levels of hypokalaemia were given different weightings, the sensitivity rose to 40·8%, with an identical specificity. Using the RGS alone improved sensitivity to 91·7%, but specificity was reduced to 62·5%. Applying the recently developed CPS to this cohort of patients, it was not possible to reproduce the 100% specificity reported in the original publication. Using the modified score or incorporating the RGS did not improve its performance. In this cohort, we were unable to show superiority of the CPS over an imaging-based strategy. CPS may have a role in guiding clinical decision-making, especially in those whose adrenal venous sampling (AVS) has been unsuccessful. © 2013 John Wiley & Sons Ltd.

  11. Parent Rated Symptoms of Inattention in Childhood Predict High School Academic Achievement Across Two Culturally and Diagnostically Diverse Samples

    Directory of Open Access Journals (Sweden)

    Astri J. Lundervold

    2017-08-01

    Full Text Available Objective: To investigate parent reports of childhood symptoms of inattention as a predictor of adolescent academic achievement, taking into account the impact of the child’s intellectual functioning, in two diagnostically and culturally diverse samples.Method: Samples: (a an all-female sample in the U.S. predominated by youth with ADHD (Berkeley Girls with ADHD Longitudinal Study [BGALS], N = 202, and (b a mixed-sex sample recruited from a Norwegian population-based sample (the Bergen Child Study [BCS], N = 93. Inattention and intellectual function were assessed via the same measures in the two samples; academic achievement scores during and beyond high school and demographic covariates were country-specific.Results: Childhood inattention predicted subsequent academic achievement in both samples, with a somewhat stronger effect in the BGALS sample, which included a large subgroup of children with ADHD. Intellectual function was another strong predictor, but the effect of early inattention remained statistically significant in both samples when intellectual function was covaried.Conclusion: The effect of early indicators of inattention on future academic success was robust across the two samples. These results support the use of remediation procedures broadly applied. Future longitudinal multicenter studies with pre-planned common inclusion criteria should be performed to increase our understanding of the importance of inattention in primary school children for concurrent and prospective functioning.

  12. Distribution of Total Depressive Symptoms Scores and Each Depressive Symptom Item in a Sample of Japanese Employees.

    Science.gov (United States)

    Tomitaka, Shinichiro; Kawasaki, Yohei; Ide, Kazuki; Yamada, Hiroshi; Miyake, Hirotsugu; Furukawa, Toshiaki A; Furukaw, Toshiaki A

    2016-01-01

    In a previous study, we reported that the distribution of total depressive symptoms scores according to the Center for Epidemiologic Studies Depression Scale (CES-D) in a general population is stable throughout middle adulthood and follows an exponential pattern except for at the lowest end of the symptom score. Furthermore, the individual distributions of 16 negative symptom items of the CES-D exhibit a common mathematical pattern. To confirm the reproducibility of these findings, we investigated the distribution of total depressive symptoms scores and 16 negative symptom items in a sample of Japanese employees. We analyzed 7624 employees aged 20-59 years who had participated in the Northern Japan Occupational Health Promotion Centers Collaboration Study for Mental Health. Depressive symptoms were assessed using the CES-D. The CES-D contains 20 items, each of which is scored in four grades: "rarely," "some," "much," and "most of the time." The descriptive statistics and frequency curves of the distributions were then compared according to age group. The distribution of total depressive symptoms scores appeared to be stable from 30-59 years. The right tail of the distribution for ages 30-59 years exhibited a linear pattern with a log-normal scale. The distributions of the 16 individual negative symptom items of the CES-D exhibited a common mathematical pattern which displayed different distributions with a boundary at "some." The distributions of the 16 negative symptom items from "some" to "most" followed a linear pattern with a log-normal scale. The distributions of the total depressive symptoms scores and individual negative symptom items in a Japanese occupational setting show the same patterns as those observed in a general population. These results show that the specific mathematical patterns of the distributions of total depressive symptoms scores and individual negative symptom items can be reproduced in an occupational population.

  13. Percentiles of the null distribution of 2 maximum lod score tests.

    Science.gov (United States)

    Ulgen, Ayse; Yoo, Yun Joo; Gordon, Derek; Finch, Stephen J; Mendell, Nancy R

    2004-01-01

    We here consider the null distribution of the maximum lod score (LOD-M) obtained upon maximizing over transmission model parameters (penetrance values, dominance, and allele frequency) as well as the recombination fraction. Also considered is the lod score maximized over a fixed choice of genetic model parameters and recombination-fraction values set prior to the analysis (MMLS) as proposed by Hodge et al. The objective is to fit parametric distributions to MMLS and LOD-M. Our results are based on 3,600 simulations of samples of n = 100 nuclear families ascertained for having one affected member and at least one other sibling available for linkage analysis. Each null distribution is approximately a mixture p(2)(0) + (1 - p)(2)(v). The values of MMLS appear to fit the mixture 0.20(2)(0) + 0.80chi(2)(1.6). The mixture distribution 0.13(2)(0) + 0.87chi(2)(2.8). appears to describe the null distribution of LOD-M. From these results we derive a simple method for obtaining critical values of LOD-M and MMLS. Copyright 2004 S. Karger AG, Basel

  14. Using Automated Essay Scores as an Anchor When Equating Constructed Response Writing Tests

    Science.gov (United States)

    Almond, Russell G.

    2014-01-01

    Assessments consisting of only a few extended constructed response items (essays) are not typically equated using anchor test designs as there are typically too few essay prompts in each form to allow for meaningful equating. This article explores the idea that output from an automated scoring program designed to measure writing fluency (a common…

  15. Implications of Deployed and Nondeployed Fathers on Seventh Graders' California Achievement Test Scores during a Military Crisis.

    Science.gov (United States)

    Pisano, Mark C.

    The differences in California Achievement Test (CAT) scores from 1990 to 1991 in seventh graders, currently enrolled in Albritton Junior High School in the Fort Bragg Schools, of deployed and nondeployed fathers were analyzed. CAT percentile scores from 1990 and 1991 (1991 being the year of "Desert Storm") were obtained in reading, math…

  16. Robustness to non-normality of common tests for the many-sample location problem

    Directory of Open Access Journals (Sweden)

    Azmeri Khan

    2003-01-01

    Full Text Available This paper studies the effect of deviating from the normal distribution assumption when considering the power of two many-sample location test procedures: ANOVA (parametric and Kruskal-Wallis (non-parametric. Power functions for these tests under various conditions are produced using simulation, where the simulated data are produced using MacGillivray and Cannon's [10] recently suggested g-and-k distribution. This distribution can provide data with selected amounts of skewness and kurtosis by varying two nearly independent parameters.

  17. 40 CFR 205.160-2 - Test sample selection and preparation.

    Science.gov (United States)

    2010-07-01

    ... 40 Protection of Environment 24 2010-07-01 2010-07-01 false Test sample selection and preparation... sample selection and preparation. (a) Vehicles comprising the sample which are required to be tested... maintained in any manner unless such preparation, tests, modifications, adjustments or maintenance are part...

  18. How Well Does the Sum Score Summarize the Test? Summability as a Measure of Internal Consistency

    NARCIS (Netherlands)

    Goeman, J.J.; De, Jong N.H.

    2018-01-01

    Many researchers use Cronbach's alpha to demonstrate internal consistency, even though it has been shown numerous times that Cronbach's alpha is not suitable for this. Because the intention of questionnaire and test constructers is to summarize the test by its overall sum score, we advocate

  19. Predicting neuropsychological test performance on the basis of temporal orientation.

    Science.gov (United States)

    Ryan, Joseph J; Glass, Laura A; Bartels, Jared M; Bergner, CariAnn M; Paolo, Anthony M

    2009-05-01

    Temporal orientation is often disrupted in the context of psychiatric or neurological disease; tests assessing this function are included in most mental status examinations. The present study examined the relationship between scores on the Temporal Orientation Scale (TOS) and performance on a battery of tests that assess memory, language, and cognitive functioning in a sample of patients with Alzheimer's disease (N = 55). Pearson-product moment correlations showed that, in all but two instances, the TOS was significantly correlated with each neuropsychological measure, p values < or = .05. Also, severely disoriented (i.e., TOS score < or = -8) patients were consistently 'impaired' on memory tests but not on tests of language and general cognitive functioning.

  20. Rey's Auditory Verbal Learning Test scores can be predicted from whole brain MRI in Alzheimer's disease

    Directory of Open Access Journals (Sweden)

    Elaheh Moradi

    2017-01-01

    Full Text Available Rey's Auditory Verbal Learning Test (RAVLT is a powerful neuropsychological tool for testing episodic memory, which is widely used for the cognitive assessment in dementia and pre-dementia conditions. Several studies have shown that an impairment in RAVLT scores reflect well the underlying pathology caused by Alzheimer's disease (AD, thus making RAVLT an effective early marker to detect AD in persons with memory complaints. We investigated the association between RAVLT scores (RAVLT Immediate and RAVLT Percent Forgetting and the structural brain atrophy caused by AD. The aim was to comprehensively study to what extent the RAVLT scores are predictable based on structural magnetic resonance imaging (MRI data using machine learning approaches as well as to find the most important brain regions for the estimation of RAVLT scores. For this, we built a predictive model to estimate RAVLT scores from gray matter density via elastic net penalized linear regression model. The proposed approach provided highly significant cross-validated correlation between the estimated and observed RAVLT Immediate (R = 0.50 and RAVLT Percent Forgetting (R = 0.43 in a dataset consisting of 806 AD, mild cognitive impairment (MCI or healthy subjects. In addition, the selected machine learning method provided more accurate estimates of RAVLT scores than the relevance vector regression used earlier for the estimation of RAVLT based on MRI data. The top predictors were medial temporal lobe structures and amygdala for the estimation of RAVLT Immediate and angular gyrus, hippocampus and amygdala for the estimation of RAVLT Percent Forgetting. Further, the conversion of MCI subjects to AD in 3-years could be predicted based on either observed or estimated RAVLT scores with an accuracy comparable to MRI-based biomarkers.

  1. Towards reporting standards for neuropsychological study results: A proposal to minimize communication errors with standardized qualitative descriptors for normalized test scores.

    Science.gov (United States)

    Schoenberg, Mike R; Rum, Ruba S

    2017-11-01

    Rapid, clear and efficient communication of neuropsychological results is essential to benefit patient care. Errors in communication are a lead cause of medical errors; nevertheless, there remains a lack of consistency in how neuropsychological scores are communicated. A major limitation in the communication of neuropsychological results is the inconsistent use of qualitative descriptors for standardized test scores and the use of vague terminology. PubMed search from 1 Jan 2007 to 1 Aug 2016 to identify guidelines or consensus statements for the description and reporting of qualitative terms to communicate neuropsychological test scores was conducted. The review found the use of confusing and overlapping terms to describe various ranges of percentile standardized test scores. In response, we propose a simplified set of qualitative descriptors for normalized test scores (Q-Simple) as a means to reduce errors in communicating test results. The Q-Simple qualitative terms are: 'very superior', 'superior', 'high average', 'average', 'low average', 'borderline' and 'abnormal/impaired'. A case example illustrates the proposed Q-Simple qualitative classification system to communicate neuropsychological results for neurosurgical planning. The Q-Simple qualitative descriptor system is aimed as a means to improve and standardize communication of standardized neuropsychological test scores. Research are needed to further evaluate neuropsychological communication errors. Conveying the clinical implications of neuropsychological results in a manner that minimizes risk for communication errors is a quintessential component of evidence-based practice. Copyright © 2017 Elsevier B.V. All rights reserved.

  2. 30 CFR 14.5 - Test samples.

    Science.gov (United States)

    2010-07-01

    ... MINING PRODUCTS REQUIREMENTS FOR THE APPROVAL OF FLAME-RESISTANT CONVEYOR BELTS General Provisions § 14.5 Test samples. Upon request by MSHA, the applicant must submit 3 precut, unrolled, flat conveyor belt...

  3. Genome scan for linkage to asthma using a linkage disequilibrium-lod score test.

    Science.gov (United States)

    Jiang, Y; Slager, S L; Huang, J

    2001-01-01

    We report a genome-wide linkage study of asthma on the German and Collaborative Study on the Genetics of Asthma (CSGA) data. Using a combined linkage and linkage disequilibrium test and the nonparametric linkage score, we identified 13 markers from the German data, 1 marker from the African American (CSGA) data, and 7 markers from the Caucasian (CSGA) data in which the p-values ranged between 0.0001 and 0.0100. From our analysis and taking into account previous published linkage studies of asthma, we suggest that three regions in chromosome 5 (around D5S418, D5S644, and D5S422), one region in chromosome 6 (around three neighboring markers D6S1281, D6S291, and D6S1019), one region in chromosome 11 (around D11S2362), and two regions in chromosome 12 (around D12S351 and D12S324) especially merit further investigation.

  4. Relationship between substances in seminal plasma and Acrobeads Test results.

    Science.gov (United States)

    Komori, Kazuhiko; Tsujimura, Akira; Okamoto, Yoshio; Matsuoka, Yasuhiro; Takao, Tetsuya; Miyagawa, Yasushi; Takada, Shingo; Nonomura, Norio; Okuyama, Akihiko

    2009-01-01

    To asses the effects of seminal plasma on sperm function. Retrospective case-control study. University hospital. One hundred fourteen infertile men. Acrobeads Test scores (0-4) and measurement of interleukin (IL)-6, soluble IL-6 receptor, epidermal growth factor, insulin-like growth factor-I (IGF-I), transforming growth factor-beta I, superoxide dismutase, calcitonin, and macrophage migration inhibitory factor (MIF) levels in seminal plasma. Kruskal-Wallis test to compare the concentrations of substances as a nonparametric test for differences among Acrobeads Test scores and a multivariable logistic regression model to find independent risk factors associated with abnormal Acrobeads Test results. The Acrobeads Test score was 0 for 7 samples, 1 for 20 samples, 2 for 18 samples, 3 for 28 samples, and 4 for 41 samples. Age, abstinence period, and semen parameters, except for sperm motility and percentage of sperm with abnormal morphology, had no effect on the Acrobeads Test results. Concentrations of IGF-I and MIF were significantly higher in patients with abnormal Acrobeads Test results. Multivariate analysis indicated that MIF and IGF-I were significantly associated with abnormal Acrobeads Test results (scores 0 to 1). Although further studies are needed, IGF-I and MIF in seminal plasma may have negative effects on sperm function.

  5. Basic distribution free identification tests for small size samples of environmental data

    Energy Technology Data Exchange (ETDEWEB)

    Federico, A.G.; Musmeci, F. [ENEA, Centro Ricerche Casaccia, Rome (Italy). Dipt. Ambiente

    1998-01-01

    Testing two or more data sets for the hypothesis that they are sampled form the same population is often required in environmental data analysis. Typically the available samples have a small number of data and often then assumption of normal distributions is not realistic. On the other hand the diffusion of the days powerful Personal Computers opens new possible opportunities based on a massive use of the CPU resources. The paper reviews the problem introducing the feasibility of two non parametric approaches based on intrinsic equi probability properties of the data samples. The first one is based on a full re sampling while the second is based on a bootstrap approach. A easy to use program is presented. A case study is given based on the Chernobyl children contamination data. [Italiano] Nell`analisi di dati ambientali ricorre spesso il caso di dover sottoporre a test l`ipotesi di provenienza di due, o piu`, insiemi di dati dalla stessa popolazione. Tipicamente i dati disponibili sono pochi e spesso l`ipotesi di provenienza da distribuzioni normali non e` sostenibile. D`altra aprte la diffusione odierna di Personal Computer fornisce nuove possibili soluzioni basate sull`uso intensivo delle risorse della CPU. Il rapporto analizza il problema e presenta la possibilita` di utilizzo di due test non parametrici basati sulle proprieta` intrinseche di equiprobabilita` dei campioni. Il primo e` basato su una tecnica di ricampionamento esaustivo mentre il secondo su un approccio di tipo bootstrap. E` presentato un programma di semplice utilizzo e un caso di studio basato su dati di contaminazione di bambini a Chernobyl.

  6. An Evaluation of the IntelliMetric[SM] Essay Scoring System

    Science.gov (United States)

    Rudner, Lawrence M.; Garcia, Veronica; Welch, Catherine

    2006-01-01

    This report provides a two-part evaluation of the IntelliMetric[SM] automated essay scoring system based on its performance scoring essays from the Analytic Writing Assessment of the Graduate Management Admission Test[TM] (GMAT[TM]). The IntelliMetric system performance is first compared to that of individual human raters, a Bayesian system…

  7. Assessment of long-term gas sampling design at two commercial manure-belt layer barns.

    Science.gov (United States)

    Chai, Li-Long; Ni, Ji-Qin; Chen, Yan; Diehl, Claude A; Heber, Albert J; Lim, Teng T

    2010-06-01

    Understanding temporal and spatial variations of aerial pollutant concentrations is important for designing air quality monitoring systems. In long-term and continuous air quality monitoring in large livestock and poultry barns, these systems usually use location-shared analyzers and sensors and can only sample air at limited number of locations. To assess the validity of the gas sampling design at a commercial layer farm, a new methodology was developed to map pollutant gas concentrations using portable sensors under steady-state or quasi-steady-state barn conditions. Three assessment tests were conducted from December 2008 to February 2009 in two manure-belt layer barns. Each barn was 140.2 m long and 19.5 m wide and had 250,000 birds. Each test included four measurements of ammonia and carbon dioxide concentrations at 20 locations that covered all operating fans, including six of the fans used in the long-term sampling that represented three zones along the lengths of the barns, to generate data for complete-barn monitoring. To simulate the long-term monitoring, gas concentrations from the six long-term sampling locations were extracted from the 20 assessment locations. Statistical analyses were performed to test the variances (F-test) and sample means (t test) between the 6- and 20-sample data. The study clearly demonstrated ammonia and carbon dioxide concentration gradients that were characterized by increasing concentrations from the west to east ends of the barns following the under-cage manure-belt travel direction. Mean concentrations increased from 7.1 to 47.7 parts per million (ppm) for ammonia and from 2303 to 3454 ppm for carbon dioxide from the west to east of the barns. Variations of mean gas concentrations were much less apparent between the south and north sides of the barns, because they were 21.2 and 20.9 ppm for ammonia and 2979 and 2951 ppm for carbon dioxide, respectively. The null hypotheses that the variances and means between the 6- and 20

  8. Minnesota Multiphasic Personality Inventory-2-Restructured Form (MMPI-2-RF) predictors of police officer problem behavior and collateral self-report test scores.

    Science.gov (United States)

    Tarescavage, Anthony M; Fischler, Gary L; Cappo, Bruce M; Hill, David O; Corey, David M; Ben-Porath, Yossef S

    2015-03-01

    The current study examined the predictive validity of Minnesota Multiphasic Personality Inventory-2-Restructured Form (MMPI-2-RF; Ben-Porath & Tellegen, 2008/2011) scores in police officer screenings. We utilized a sample of 712 police officer candidates (82.6% male) from 2 Midwestern police departments. The sample included 426 hired officers, most of whom had supervisor ratings of problem behaviors and human resource records of civilian complaints. With the full sample, we calculated zero-order correlations between MMPI-2-RF scale scores and scale scores from the California Psychological Inventory (Gough, 1956) and Inwald Personality Inventory (Inwald, 2006) by gender. In the hired sample, we correlated MMPI-2-RF scale scores with the outcome data for males only, owing to the relatively small number of hired women. Several scales demonstrated meaningful correlations with the criteria, particularly in the thought dysfunction and behavioral/externalizing dysfunction domains. After applying a correction for range restriction, the correlation coefficient magnitudes were generally in the moderate to large range. The practical implications of these findings were explored by means of risk ratio analyses, which indicated that officers who produced elevations at cutscores lower than the traditionally used 65 T-score level were as much as 10 times more likely than those scoring below the cutoff to exhibit problem behaviors. Overall, the results supported the validity of the MMPI-2-RF in this setting. Implications and limitations of this study are discussed. 2015 APA, all rights reserved

  9. Unexplained Graft Dysfunction after Heart Transplantation—Role of Novel Molecular Expression Test Score and QTc-Interval: A Case Report

    Directory of Open Access Journals (Sweden)

    Khurram Shahzad

    2010-01-01

    Full Text Available In the current era of immunosuppressive medications there is increased observed incidence of graft dysfunction in the absence of known histological criteria of rejection after heart transplantation. A noninvasive molecular expression diagnostic test was developed and validated to rule out histological acute cellular rejection. In this paper we present for the first time, longitudinal pattern of changes in this novel diagnostic test score along with QTc-interval in a patient who was admitted with unexplained graft dysfunction. Patient presented with graft failure with negative findings on all known criteria of rejection including acute cellular rejection, antibody mediated rejection and cardiac allograft vasculopathy. The molecular expression test score showed gradual increase and QTc-interval showed gradual prolongation with the gradual decline in graft function. This paper exemplifies that in patients presenting with unexplained graft dysfunction, GEP test score and QTc-interval correlate with the changes in the graft function.

  10. Clinical performance of two visual scoring systems in detecting and assessing activity status of occlusal caries in primary teeth

    DEFF Research Database (Denmark)

    Braga, M M; Ekstrand, K R; Martignon, S

    2010-01-01

    This study aimed to compare the clinical performance of two sets of visual scoring criteria for detecting caries severity and assessing caries activity status in occlusal surfaces. Two visual scoring systems--the Nyvad criteria (NY) and the ICDAS-II including an adjunct system for lesion activity...

  11. A Prognostic Scoring Tool for Cesarean Organ/Space Surgical Site Infections: Derivation and Internal Validation.

    Science.gov (United States)

    Assawapalanggool, Srisuda; Kasatpibal, Nongyao; Sirichotiyakul, Supatra; Arora, Rajin; Suntornlimsiri, Watcharin

    Organ/space surgical site infections (SSIs) are serious complications after cesarean delivery. However, no scoring tool to predict these complications has yet been developed. This study sought to develop and validate a prognostic scoring tool for cesarean organ/space SSIs. Data for case and non-case of cesarean organ/space SSI between January 1, 2007 and December 31, 2012 from a tertiary care hospital in Thailand were analyzed. Stepwise multivariable logistic regression was used to select the best predictor combination and their coefficients were transformed to a risk scoring tool. The likelihood ratio of positive for each risk category and the area under receiver operating characteristic (AUROC) curves were analyzed on total scores. Internal validation using bootstrap re-sampling was tested for reproducibility. The predictors of 243 organ/space SSIs from 4,988 eligible cesarean delivery cases comprised the presence of foul-smelling amniotic fluid (four points), vaginal examination five or more times before incision (two points), wound class III or greater (two points), being referred from local setting (two points), hemoglobin less than 11 g/dL (one point), and ethnic minorities (one point). The likelihood ratio of cesarean organ/space SSIs with 95% confidence interval among low (total score of 0-1 point), medium (total score of 2-5 points), and high risk (total score of ≥6 points) categories were 0.11 (0.07-0.19), 1.03 (0.89-1.18), and 13.25 (10.87-16.14), respectively. Both AUROCs of the derivation and validation data were comparable (87.57% versus 86.08%; p = 0.418). This scoring tool showed a high predictive ability regarding cesarean organ/space SSIs on the derivation data and reproducibility was demonstrated on internal validation. It could assist practitioners prioritize patient care and management depending on risk category and decrease SSI rates in cesarean deliveries.

  12. Pigeons exhibit higher accuracy for chosen memory tests than for forced memory tests in duration matching-to-sample.

    Science.gov (United States)

    Adams, Allison; Santi, Angelo

    2011-03-01

    Following training to match 2- and 8-sec durations of feederlight to red and green comparisons with a 0-sec baseline delay, pigeons were allowed to choose to take a memory test or to escape the memory test. The effects of sample omission, increases in retention interval, and variation in trial spacing on selection of the escape option and accuracy were studied. During initial testing, escaping the test did not increase as the task became more difficult, and there was no difference in accuracy between chosen and forced memory tests. However, with extended training, accuracy for chosen tests was significantly greater than for forced tests. In addition, two pigeons exhibited higher accuracy on chosen tests than on forced tests at the short retention interval and greater escape rates at the long retention interval. These results have not been obtained in previous studies with pigeons when the choice to take the test or to escape the test is given before test stimuli are presented. It appears that task-specific methodological factors may determine whether a particular species will exhibit the two behavioral effects that were initially proposed as potentially indicative of metacognition.

  13. Rugby versus Soccer in South Africa: Content Familiarity Contributes to Cross-Cultural Differences in Cognitive Test Scores

    Science.gov (United States)

    Malda, Maike; van de Vijver, Fons J. R.; Temane, Q. Michael

    2010-01-01

    In this study, cross-cultural differences in cognitive test scores are hypothesized to depend on a test's cultural complexity (Cultural Complexity Hypothesis: CCH), here conceptualized as its content familiarity, rather than on its cognitive complexity (Spearman's Hypothesis: SH). The content familiarity of tests assessing short-term memory,…

  14. DISCRIMINATIVE ANALYSIS OF TESTS FOR EVALUATING SITUATIONMOTORIC ABILITIES BETWEEN TWO GROUPS OF BASKETBALL PLAYERS SELECTED BY THE TEST OF SOCIOMETRY

    OpenAIRE

    Abdulla Elezi; Nazim Myrtaj; Florian Miftari

    2011-01-01

    Determining differences between the two groups of basketball players selected with the modified sociometric test (Paranosić and Lazarević) in some tests for assessing situation-motor skills, was the aim of this work. The test sample was consisted of 20 basketball players who had most positive points and 20 basketball players who had most negative points, in total- 40 players. T-test was applied to determine whether there are differences between the two groups of basketball players who had bee...

  15. An analysis of the VOSP Silhouettes Test with neurological patients

    Directory of Open Access Journals (Sweden)

    THOMAS MERTEN

    2006-12-01

    Full Text Available An item analysis of the Silhouettes, part of the Visual Object and Space Perception Battery, was performed using the test protocols of 266 German-speaking neurological patients with a mean age of 54.8 years, all of them presenting some sort of brain pathology. The sample yielded a mean test score of 17.0 (SD = 4.6. The two subsets of 15 animals and 15 objects were only moderately correlated (0.45, so the inclusion into a single scale is questionable. Other reliability estimates were also rather low (0.62 to 0.77. Moreover, gross deviations in item difficulty were obtained with this sample; scoring rules were found to be insufficiently explicit. Despite moderate rank correlations with other instruments (Hooper VOT: 0.65; WAIS-R Block Design: 0.57; neuropsychological screening battery SKT: -0.45, the psychometric properties obtained with this sample must be considered to be insufficient.

  16. Importance of Statistical Evidence in Estimating Valid DEA Scores.

    Science.gov (United States)

    Barnum, Darold T; Johnson, Matthew; Gleason, John M

    2016-03-01

    Data Envelopment Analysis (DEA) allows healthcare scholars to measure productivity in a holistic manner. It combines a production unit's multiple outputs and multiple inputs into a single measure of its overall performance relative to other units in the sample being analyzed. It accomplishes this task by aggregating a unit's weighted outputs and dividing the output sum by the unit's aggregated weighted inputs, choosing output and input weights that maximize its output/input ratio when the same weights are applied to other units in the sample. Conventional DEA assumes that inputs and outputs are used in different proportions by the units in the sample. So, for the sample as a whole, inputs have been substituted for each other and outputs have been transformed into each other. Variables are assigned different weights based on their marginal rates of substitution and marginal rates of transformation. If in truth inputs have not been substituted nor outputs transformed, then there will be no marginal rates and therefore no valid basis for differential weights. This paper explains how to statistically test for the presence of substitutions among inputs and transformations among outputs. Then, it applies these tests to the input and output data from three healthcare DEA articles, in order to identify the effects on DEA scores when input substitutions and output transformations are absent in the sample data. It finds that DEA scores are badly biased when substitution and transformation are absent and conventional DEA models are used.

  17. Simple shoulder test and Oxford Shoulder Score: Persian translation and cross-cultural validation.

    Science.gov (United States)

    Naghdi, Soofia; Nakhostin Ansari, Noureddin; Rustaie, Nilufar; Akbari, Mohammad; Ebadi, Safoora; Senobari, Maryam; Hasson, Scott

    2015-12-01

    To translate, culturally adapt, and validate the simple shoulder test (SST) and Oxford Shoulder Score (OSS) into Persian language using a cross-sectional and prospective cohort design. A standard forward and backward translation was followed to culturally adapt the SST and the OSS into Persian language. Psychometric properties of floor and ceiling effects, construct convergent validity, discriminant validity, internal consistency reliability, test-retest reliability, standard error of the measurement (SEM), smallest detectable change (SDC), and factor structure were determined. One hundred patients with shoulder disorders and 50 healthy subjects participated in the study. The PSST and the POSS showed no missing responses. No floor or ceiling effects were observed. Both the PSST and POSS detected differences between patients and healthy subjects supporting their discriminant validity. Construct convergent validity was confirmed by a very good correlation between the PSST and POSS (r = 0.68). There was high internal consistency for both the PSST (α = 0.73) and the POSS (α = 0.91 and 0.92). Test-retest reliability with 1-week interval was excellent (ICCagreement = 0.94 for PSST and 0.90 for POSS). Factor analyses demonstrated a three-factor solution for the PSST (49.7 % of variance) and a two-factor solution for the POSS (61.6 % of variance). The SEM/SDC was satisfactory for PSST (5.5/15.3) and POSS (6.8/18.8). The PSST and POSS are valid and reliable outcome measures for assessing functional limitations in Persian-speaking patients with shoulder disorders.

  18. Comparison of disease prevalence in two populations in the presence of misclassification.

    Science.gov (United States)

    Tang, Man-Lai; Qiu, Shi-Fang; Poon, Wai-Yin

    2012-11-01

    Comparing disease prevalence in two groups is an important topic in medical research, and prevalence rates are obtained by classifying subjects according to whether they have the disease. Both high-cost infallible gold-standard classifiers or low-cost fallible classifiers can be used to classify subjects. However, statistical analysis that is based on data sets with misclassifications leads to biased results. As a compromise between the two classification approaches, partially validated sets are often used in which all individuals are classified by fallible classifiers, and some of the individuals are validated by the accurate gold-standard classifiers. In this article, we develop several reliable test procedures and approximate sample size formulas for disease prevalence studies based on the difference between two disease prevalence rates with two independent partially validated series. Empirical studies show that (i) the Score test produces close-to-nominal level and is preferred in practice; and (ii) the sample size formula based on the Score test is also fairly accurate in terms of the empirical power and type I error rate, and is hence recommended. A real example from an aplastic anemia study is used to illustrate the proposed methodologies. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  19. Matrix Sampling of Items in Large-Scale Assessments

    Directory of Open Access Journals (Sweden)

    Ruth A. Childs

    2003-07-01

    Full Text Available Matrix sampling of items -' that is, division of a set of items into different versions of a test form..-' is used by several large-scale testing programs. Like other test designs, matrixed designs have..both advantages and disadvantages. For example, testing time per student is less than if each..student received all the items, but the comparability of student scores may decrease. Also,..curriculum coverage is maintained, but reporting of scores becomes more complex. In this paper,..matrixed designs are compared with more traditional designs in nine categories of costs:..development costs, materials costs, administration costs, educational costs, scoring costs,..reliability costs, comparability costs, validity costs, and reporting costs. In choosing among test..designs, a testing program should examine the costs in light of its mandate(s, the content of the..tests, and the financial resources available, among other considerations.

  20. Test of a sample container for shipment of small size plutonium samples with PAT-2

    International Nuclear Information System (INIS)

    Kuhn, E.; Aigner, H.; Deron, S.

    1981-11-01

    A light-weight container for the air transport of plutonium, to be designated PAT-2, has been developed in the USA and is presently undergoing licensing. The very limited effective space for bearing plutonium required the design of small size sample canisters to meet the needs of international safeguards for the shipment of plutonium samples. The applicability of a small canister for the sampling of small size powder and solution samples has been tested in an intralaboratory experiment. The results of the experiment, based on the concept of pre-weighed samples, show that the tested canister can successfully be used for the sampling of small size PuO 2 -powder samples of homogeneous source material, as well as for dried aliquands of plutonium nitrate solutions. (author)

  1. To test or not to test

    DEFF Research Database (Denmark)

    Rochon, Justine; Gondan, Matthias; Kieser, Meinhard

    2012-01-01

    Background: Student's two-sample t test is generally used for comparing the means of two independent samples, for example, two treatment arms. Under the null hypothesis, the t test assumes that the two samples arise from the same normally distributed population with unknown variance. Adequate...... control of the Type I error requires that the normality assumption holds, which is often examined by means of a preliminary Shapiro-Wilk test. The following two-stage procedure is widely accepted: If the preliminary test for normality is not significant, the t test is used; if the preliminary test rejects...... the null hypothesis of normality, a nonparametric test is applied in the main analysis. Methods: Equally sized samples were drawn from exponential, uniform, and normal distributions. The two-sample t test was conducted if either both samples (Strategy I) or the collapsed set of residuals from both samples...

  2. DWPF Sample Vial Insert Study-Statistical Analysis of DWPF Mock-Up Test Data

    Energy Technology Data Exchange (ETDEWEB)

    Harris, S.P. [Westinghouse Savannah River Company, AIKEN, SC (United States)

    1997-09-18

    This report is prepared as part of Technical/QA Task Plan WSRC-RP-97-351 which was issued in response to Technical Task Request HLW/DWPF/TTR-970132 submitted by DWPF. Presented in this report is a statistical analysis of DWPF Mock-up test data for evaluation of two new analytical methods which use insert samples from the existing HydragardTM sampler. The first is a new hydrofluoric acid based method called the Cold Chemical Method (Cold Chem) and the second is a modified fusion method.Either new DWPF analytical method could result in a two to three fold improvement in sample analysis time.Both new methods use the existing HydragardTM sampler to collect a smaller insert sample from the process sampling system. The insert testing methodology applies to the DWPF Slurry Mix Evaporator (SME) and the Melter Feed Tank (MFT) samples.The insert sample is named after the initial trials which placed the container inside the sample (peanut) vials. Samples in small 3 ml containers (Inserts) are analyzed by either the cold chemical method or a modified fusion method. The current analytical method uses a HydragardTM sample station to obtain nearly full 15 ml peanut vials. The samples are prepared by a multi-step process for Inductively Coupled Plasma (ICP) analysis by drying, vitrification, grinding and finally dissolution by either mixed acid or fusion. In contrast, the insert sample is placed directly in the dissolution vessel, thus eliminating the drying, vitrification and grinding operations for the Cold chem method. Although the modified fusion still requires drying and calcine conversion, the process is rapid due to the decreased sample size and that no vitrification step is required.A slurry feed simulant material was acquired from the TNX pilot facility from the test run designated as PX-7.The Mock-up test data were gathered on the basis of a statistical design presented in SRT-SCS-97004 (Rev. 0). Simulant PX-7 samples were taken in the DWPF Analytical Cell Mock

  3. Best waveform score for diagnosing keratoconus

    Directory of Open Access Journals (Sweden)

    Allan Luz

    2013-12-01

    Full Text Available PURPOSE: To test whether corneal hysteresis (CH and corneal resistance factor (CRF can discriminate between keratoconus and normal eyes and to evaluate whether the averages of two consecutive measurements perform differently from the one with the best waveform score (WS for diagnosing keratoconus. METHODS: ORA measurements for one eye per individual were selected randomly from 53 normal patients and from 27 patients with keratoconus. Two groups were considered the average (CH-Avg, CRF-Avg and best waveform score (CH-WS, CRF-WS groups. The Mann-Whitney U-test was used to evaluate whether the variables had similar distributions in the Normal and Keratoconus groups. Receiver operating characteristics (ROC curves were calculated for each parameter to assess the efficacy for diagnosing keratoconus and the same obtained for each variable were compared pairwise using the Hanley-McNeil test. RESULTS: The CH-Avg, CRF-Avg, CH-WS and CRF-WS differed significantly between the normal and keratoconus groups (p<0.001. The areas under the ROC curve (AUROC for CH-Avg, CRF-Avg, CH-WS, and CRF-WS were 0.824, 0.873, 0.891, and 0.931, respectively. CH-WS and CRF-WS had significantly better AUROCs than CH-Avg and CRF-Avg, respectively (p=0.001 and 0.002. CONCLUSION: The analysis of the biomechanical properties of the cornea through the ORA method has proved to be an important aid in the diagnosis of keratoconus, regardless of the method used. The best waveform score (WS measurements were superior to the average of consecutive ORA measurements for diagnosing keratoconus.

  4. A risk score to predict type 2 diabetes mellitus in an elderly Spanish Mediterranean population at high cardiovascular risk.

    Directory of Open Access Journals (Sweden)

    Marta Guasch-Ferré

    Full Text Available INTRODUCTION: To develop and test a diabetes risk score to predict incident diabetes in an elderly Spanish Mediterranean population at high cardiovascular risk. MATERIALS AND METHODS: A diabetes risk score was derived from a subset of 1381 nondiabetic individuals from three centres of the PREDIMED study (derivation sample. Multivariate Cox regression model ß-coefficients were used to weigh each risk factor. PREDIMED-personal Score included body-mass-index, smoking status, family history of type 2 diabetes, alcohol consumption and hypertension as categorical variables; PREDIMED-clinical Score included also high blood glucose. We tested the predictive capability of these scores in the DE-PLAN-CAT cohort (validation sample. The discrimination of Finnish Diabetes Risk Score (FINDRISC, German Diabetes Risk Score (GDRS and our scores was assessed with the area under curve (AUC. RESULTS: The PREDIMED-clinical Score varied from 0 to 14 points. In the subset of the PREDIMED study, 155 individuals developed diabetes during the 4.75-years follow-up. The PREDIMED-clinical score at a cutoff of ≥6 had sensitivity of 72.2%, and specificity of 72.5%, whereas AUC was 0.78. The AUC of the PREDIMED-clinical Score was 0.66 in the validation sample (sensitivity = 85.4%; specificity = 26.6%, and was significantly higher than the FINDRISC and the GDRS in both the derivation and validation samples. DISCUSSION: We identified classical risk factors for diabetes and developed the PREDIMED-clinical Score to determine those individuals at high risk of developing diabetes in elderly individuals at high cardiovascular risk. The predictive capability of the PREDIMED-clinical Score was significantly higher than the FINDRISC and GDRS, and also used fewer items in the questionnaire.

  5. Differences in distribution of T-scores and Z-scores among bone densitometry tests in postmenopausal women (a comparative study)

    International Nuclear Information System (INIS)

    Wendlova, J.

    2002-01-01

    To determine the character of T-score and Z-score value distribution in individually selected methods of bone densitometry and to compare them using statistical analysis. We examined 56 postmenopausal women with an age between 43 and 68 years with osteopenia or osteoporosis according to the WHO classification. The following measurements were made in each patient: T-score and Z-score for: 1) Stiffness index (S) of the left heel bone, USM (index). 2) Bone mineral density of the left heel bone (BMDh), DEXA (g of Ca hydroxyapatite per cm 2 ). 3) Bone mineral density of trabecular bone of the L1 vertebra (BMDL1). QCT (mg of Ca hydroxyapatite per cm 3 ). The densitometers used in the study were: ultrasonometer to measure heel bone, Achilles plus LUNAR, USA: DEXA to measure heel bone, PIXl, LUNAR, USA: QCT to measure the L1 vertebra, CT, SOMATOM Plus, Siemens, Germany. Statistical analysis: differences between measured values of T-scores (Z-scores) were evaluated by parametric or non-parametric methods of determining the 95 % confidence intervals (C.I.). Differences between Z-score and T-score values for compared measurements were statistically significant; however, these differences were lower for Z-scores. Largest differences in 95 % C.I., characterizing individual measurements of T-score values (in comparison with Z-scores), were found for those densitometers whose age range of the reference groups of young adults differed the most, and conversely, the smallest differences in T-score values were found when the differences between the age ranges of reference groups were smallest. The higher variation in T-score values in comparison to Z-scores is also caused by a non-standard selection of the reference groups of young adults for the QCT, PIXI and Achilles Plus densitometers used in the study. Age characteristics of the reference group for T-scores should be standardized for all types of densitometers. (author)

  6. Risk score for first-screening of prevalent undiagnosed chronic kidney disease in Peru: the CRONICAS-CKD risk score.

    Science.gov (United States)

    Carrillo-Larco, Rodrigo M; Miranda, J Jaime; Gilman, Robert H; Medina-Lezama, Josefina; Chirinos-Pacheco, Julio A; Muñoz-Retamozo, Paola V; Smeeth, Liam; Checkley, William; Bernabe-Ortiz, Antonio

    2017-11-29

    Chronic Kidney Disease (CKD) represents a great burden for the patient and the health system, particularly if diagnosed at late stages. Consequently, tools to identify patients at high risk of having CKD are needed, particularly in limited-resources settings where laboratory facilities are scarce. This study aimed to develop a risk score for prevalent undiagnosed CKD using data from four settings in Peru: a complete risk score including all associated risk factors and another excluding laboratory-based variables. Cross-sectional study. We used two population-based studies: one for developing and internal validation (CRONICAS), and another (PREVENCION) for external validation. Risk factors included clinical- and laboratory-based variables, among others: sex, age, hypertension and obesity; and lipid profile, anemia and glucose metabolism. The outcome was undiagnosed CKD: eGFR anemia were strongly associated with undiagnosed CKD. In the external validation, at a cut-off point of 2, the complete and laboratory-free risk scores performed similarly well with a ROC area of 76.2% and 76.0%, respectively (P = 0.784). The best assessment parameter of these risk scores was their negative predictive value: 99.1% and 99.0% for the complete and laboratory-free, respectively. The developed risk scores showed a moderate performance as a screening test. People with a score of ≥ 2 points should undergo further testing to rule out CKD. Using the laboratory-free risk score is a practical approach in developing countries where laboratories are not readily available and undiagnosed CKD has significant morbidity and mortality.

  7. A Study on Variables that Affect Class Scores of Primary Education Students in Placement Test

    OpenAIRE

    Yavuz, Mustafa

    2010-01-01

    This study aims to determine the variables that predict class scores which are obtained by adding 70 % of the Placement Test (PT) scores of the primary education sixth and seventh grade students who took it for the first time in the 2007-2008 academic year within the framework of the system of passing to secondary education reorganized by the MNE, 25 % of their end-of-the-year passing grades. The study is of general survey model. The study group consists of students who took the PT in the 200...

  8. Analysis of small sample size studies using nonparametric bootstrap test with pooled resampling method.

    Science.gov (United States)

    Dwivedi, Alok Kumar; Mallawaarachchi, Indika; Alvarado, Luis A

    2017-06-30

    Experimental studies in biomedical research frequently pose analytical problems related to small sample size. In such studies, there are conflicting findings regarding the choice of parametric and nonparametric analysis, especially with non-normal data. In such instances, some methodologists questioned the validity of parametric tests and suggested nonparametric tests. In contrast, other methodologists found nonparametric tests to be too conservative and less powerful and thus preferred using parametric tests. Some researchers have recommended using a bootstrap test; however, this method also has small sample size limitation. We used a pooled method in nonparametric bootstrap test that may overcome the problem related with small samples in hypothesis testing. The present study compared nonparametric bootstrap test with pooled resampling method corresponding to parametric, nonparametric, and permutation tests through extensive simulations under various conditions and using real data examples. The nonparametric pooled bootstrap t-test provided equal or greater power for comparing two means as compared with unpaired t-test, Welch t-test, Wilcoxon rank sum test, and permutation test while maintaining type I error probability for any conditions except for Cauchy and extreme variable lognormal distributions. In such cases, we suggest using an exact Wilcoxon rank sum test. Nonparametric bootstrap paired t-test also provided better performance than other alternatives. Nonparametric bootstrap test provided benefit over exact Kruskal-Wallis test. We suggest using nonparametric bootstrap test with pooled resampling method for comparing paired or unpaired means and for validating the one way analysis of variance test results for non-normal data in small sample size studies. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  9. Validity of the Wechsler Test of Adult Reading (WTAR): effort considered in a clinical sample of U.S. military veterans.

    Science.gov (United States)

    Whitney, Kriscinda A; Shepard, Polly H; Mariner, Jennifer; Mossbarger, Brad; Herman, Steven M

    2010-07-01

    The current study represents an examination of the construct validity of the Wechsler Test of Adult Reading (WTAR) among a sample of U.S. military veterans referred for outpatient neuropsychological evaluation that included a measure of negative response bias, namely, the Test of Memory Malingering (TOMM). This retrospective data analysis examined the relationship between the WTAR and measures of current verbal general intellectual function and current cognitive skills. Findings showed that, among patients passing the TOMM (N = 98), WTAR scores were most highly correlated with current verbal IQ but also showed significant correlations with verbal memory and lesser, but still significant, correlations with measures of visual-spatial memory. Discriminant validity for the WTAR was also shown among the group passing the TOMM in the sense that the WTAR, which is designed to measure verbal premorbid general intellectual skill, was not as highly correlated with measures of learning and memory as was a measure of current verbal general intellectual skill. Whereas scores on most study measures did significantly differ between the groups that passed versus failed the TOMM (N = 26), scores on the WTAR did not, suggesting that the WTAR may remain robust even in the face of suboptimal effort.

  10. An analysis of aviation test scores to characterize Student Naval Aviator disqualification

    OpenAIRE

    Wahl, Erich J.

    1998-01-01

    Approved for public release; distribution is unlimited The U.S. Navy uses the Aviation Selection Test Battery (ASTh) to identify those Student Naval Aviator (SNA) applicants most likely to succeed in flight training. Using classification and regression trees, this thesis concludes that individual answers to an ASTh subtest, the Biographical Inventory, are not good predictors of SNA primary flight grades. It also concludes that those SNA who score less than a 6 on the Pilot Biographical Inv...

  11. Quasi-supervised scoring of human sleep in polysomnograms using augmented input variables.

    Science.gov (United States)

    Yaghouby, Farid; Sunderam, Sridhar

    2015-04-01

    The limitations of manual sleep scoring make computerized methods highly desirable. Scoring errors can arise from human rater uncertainty or inter-rater variability. Sleep scoring algorithms either come as supervised classifiers that need scored samples of each state to be trained, or as unsupervised classifiers that use heuristics or structural clues in unscored data to define states. We propose a quasi-supervised classifier that models observations in an unsupervised manner but mimics a human rater wherever training scores are available. EEG, EMG, and EOG features were extracted in 30s epochs from human-scored polysomnograms recorded from 42 healthy human subjects (18-79 years) and archived in an anonymized, publicly accessible database. Hypnograms were modified so that: 1. Some states are scored but not others; 2. Samples of all states are scored but not for transitional epochs; and 3. Two raters with 67% agreement are simulated. A framework for quasi-supervised classification was devised in which unsupervised statistical models-specifically Gaussian mixtures and hidden Markov models--are estimated from unlabeled training data, but the training samples are augmented with variables whose values depend on available scores. Classifiers were fitted to signal features incorporating partial scores, and used to predict scores for complete recordings. Performance was assessed using Cohen's Κ statistic. The quasi-supervised classifier performed significantly better than an unsupervised model and sometimes as well as a completely supervised model despite receiving only partial scores. The quasi-supervised algorithm addresses the need for classifiers that mimic scoring patterns of human raters while compensating for their limitations. Copyright © 2015 Elsevier Ltd. All rights reserved.

  12. Reconnaissance On Chi-Square Test Procedure For Determining Two Species Association

    Science.gov (United States)

    Marisa, Hanifa

    2008-01-01

    Determining the assosiation of two species by using chi-square test has been published. Utility of this procedure to plants species at certain location, shows that the procedure could not find "ecologically" association. Tens sampling units have been made to record some weeds species in Indralaya, South Sumatera. Chi square test; Xt2 = N[|(ad)-(bc)|-(N/2)]2/mnrs (Eq:1) on two species (Cleome sp and Eleusine indica) of the weeds shows positive assosiation; while ecologically in nature, there is no relationship between them. Some alternatives are proposed to this problem; simplified chi-square test steps, make further study to find out ecologically association, or at last, ignore it.

  13. Diabetes Fear of Injecting and Self-Testing Questionnaire

    DEFF Research Database (Denmark)

    Mollema, E D; Snoek, Frank J; Pouwer, F

    2000-01-01

    OBJECTIVE: To study the psychometric properties of the Diabetes Fear of Injecting and Self-Testing Questionnaire (D-FISQ). RESEARCH DESIGN AND METHODS: Two groups of patients were studied. Sample A consisted of 252 insulin-treated diabetes patients. Sample B incorporated 24 insulin-treated patients......-injecting or self-testing had higher scores on FSI (P = 0.095) and FST (P = 0.01). EFA yielded 2 separate factors, FSI and FST. CONCLUSIONS: Results from this study support reliability and validity of the D-FISQ, a self-report instrument that can be used for both clinical and research purposes....

  14. Direct power comparisons between simple LOD scores and NPL scores for linkage analysis in complex diseases.

    Science.gov (United States)

    Abreu, P C; Greenberg, D A; Hodge, S E

    1999-09-01

    Several methods have been proposed for linkage analysis of complex traits with unknown mode of inheritance. These methods include the LOD score maximized over disease models (MMLS) and the "nonparametric" linkage (NPL) statistic. In previous work, we evaluated the increase of type I error when maximizing over two or more genetic models, and we compared the power of MMLS to detect linkage, in a number of complex modes of inheritance, with analysis assuming the true model. In the present study, we compare MMLS and NPL directly. We simulated 100 data sets with 20 families each, using 26 generating models: (1) 4 intermediate models (penetrance of heterozygote between that of the two homozygotes); (2) 6 two-locus additive models; and (3) 16 two-locus heterogeneity models (admixture alpha = 1.0,.7,.5, and.3; alpha = 1.0 replicates simple Mendelian models). For LOD scores, we assumed dominant and recessive inheritance with 50% penetrance. We took the higher of the two maximum LOD scores and subtracted 0.3 to correct for multiple tests (MMLS-C). We compared expected maximum LOD scores and power, using MMLS-C and NPL as well as the true model. Since NPL uses only the affected family members, we also performed an affecteds-only analysis using MMLS-C. The MMLS-C was both uniformly more powerful than NPL for most cases we examined, except when linkage information was low, and close to the results for the true model under locus heterogeneity. We still found better power for the MMLS-C compared with NPL in affecteds-only analysis. The results show that use of two simple modes of inheritance at a fixed penetrance can have more power than NPL when the trait mode of inheritance is complex and when there is heterogeneity in the data set.

  15. Just as smart but not as successful: obese students obtain lower school grades but equivalent test scores to nonobese students.

    Science.gov (United States)

    MacCann, C; Roberts, R D

    2013-01-01

    The obesity epidemic in industrialized nations has important implications for education, as research demonstrates lower academic achievement among obese students. The current paper compares the test scores and school grades of obese, overweight and normal-weight students in secondary and further education, controlling for demographic variables, personality, ability and well-being confounds. This study included 383 eighth-grade students (49% female; study 1) and 1036 students from 24 community colleges and universities (64% female, study 2), both drawn from five regions across the United States. In study 1, body mass index (BMI) was calculated using self-reports and parent reports of weight and height. In study 2, BMI was calculated from self-reported weight and height only. Both samples completed age-appropriate assessments of mathematics, vocabulary and the personality trait conscientiousness. Eighth-grade students additionally completed a measure of life satisfaction, with both self-reports and parent reports of their grades from the previous semester also obtained. Higher education students additionally completed measures of positive and negative affect, and self-reported their grades and college entrance scores. Obese students receive significantly lower grades in middle school (d=0.83), community college (d=0.34) and university (d=0.36), but show no statistically significant differences in intelligence or achievement test scores. Even after controlling for demographic variables, intelligence, personality and well-being, obese students obtain significantly lower grades than normal-weight students in the eighth grade (d=0.39), community college (d=0.42) and university (d=0.31). Lower grades may reflect peer and teacher prejudice against overweight and obese students rather than lack of ability among these students.

  16. Outgassing tests on iras solar panel samples

    Science.gov (United States)

    Premat, G.; Zwaal, A.; Pennings, N. H.

    1980-01-01

    Several outgassing tests were carried out on representative solar panel samples in order to determine the extent of contamination that could be expected from this source. The materials for the construction of the solar panels were selected as a result of contamination obtained in micro volatile condensable materials tests.

  17. Psychometric Evaluation of the Lower Extremity Computerized Adaptive Test, the Modified Harris Hip Score, and the Hip Outcome Score.

    Science.gov (United States)

    Hung, Man; Hon, Shirley D; Cheng, Christine; Franklin, Jeremy D; Aoki, Stephen K; Anderson, Mike B; Kapron, Ashley L; Peters, Christopher L; Pelt, Christopher E

    2014-12-01

    The applicability and validity of many patient-reported outcome measures in the high-functioning population are not well understood. To compare the psychometric properties of the modified Harris Hip Score (mHHS), the Hip Outcome Score activities of daily living subscale (HOS-ADL) and sports (HOS-sports), and the Lower Extremity Computerized Adaptive Test (LE CAT). The hypotheses was that all instruments would perform well but that the LE CAT would show superiority psychometrically because a combination of CAT and a large item bank allows for a high degree of measurement precision. Cohort study (diagnosis); Level of evidence, 2. Data were collected from 472 advanced-age, active participants from the Huntsman World Senior Games in 2012. Validity evidences were examined through item fit, dimensionality, monotonicity, local independence, differential item functioning, person raw score to measure correlation, and instrument coverage (ie, ceiling and floor effects), and reliability evidences were examined through Cronbach alpha and person separation index. All instruments demonstrated good item fit, unidimensionality, monotonicity, local independence, and person raw score to measure correlations. The HOS-ADL had high ceiling effects of 36.02%, and the mHHS had ceiling effects of 27.54%. The LE CAT had ceiling effects of 8.47%, and the HOS-sports had no ceiling effects. None of the instruments had any floor effects. The mHHS had a very low Cronbach alpha of 0.41 and an extremely low person separation index of 0.08. Reliabilities for the LE CAT were excellent and for the HOS-ADL and HOS-sports were good. The LE CAT showed better psychometric properties overall than the HOS-ADL, HOS-sports, and mHHS for the senior population. The mHHS demonstrated pronounced ceiling effects and poor reliabilities that should be of concern. The high ceiling effects for the HOS-ADL were also of concern. The LE CAT was superior in all psychometric aspects examined in this study. Future

  18. Teste de Bender com disléxicos: comparação de dois sistemas de pontuação Bender Test with dyslexics: Comparison of two systems of punctuation

    Directory of Open Access Journals (Sweden)

    Acácia Aparecida Angeli dos Santos

    2007-06-01

    Full Text Available Este estudo teve como objetivo avaliar aspectos maturacionais e disfuncionais referentes à percepção visomotora de disléxicos valendo-se, para isso, do Teste de Bender analisado sob dois sistemas de correção, o sistema de pontuação gradual (B-SPG e o sistema Lacks. Participaram da pesquisa 20 disléxicos com idade entre 9 e 16 anos (M = 12, sendo 16 do sexo masculino e 4 do sexo feminino. Os dados mostraram que a média de erros dos disléxicos foi acima do que é esperado para crianças de nove e dez anos que compõem a amostra normativa do B-SPG. De acordo com o sistema Lacks de pontuação os fatores mais comprometidos na amostra de disléxicos foram referentes a mudanças na forma da gestalt e distorção da gestalt, fatores esses equivalentes à distorção da forma no B-SPG. O índice de correlação entre os dois sistemas foi significativo e alto (r = 0,76.The present study aimed to evaluate maturity and dysfunctional aspects referred to visualmotor perception of dyslexics using for this, the Bender Test analyzed under two correction systems, being the gradual scoring system (B-SPG and Lacks system. Twenty dyslexics have taken part in the research with ages from 9 to 16 years (average=12, being 16 males and 4 females. The data showed the average of mistakes committed by the dyslexics was above the expected for children from nine to ten years who compose the B-SPG normative sample. According to the Lacks scoring system the factors more affected in the sample of dyslexics were referred to changes on the gestalt form and distortions, factors which are equivalent to the form distortions on the B-SPG. The correlation index between these two systems was significant (r = 0.76.

  19. Virtual rough samples to test 3D nanometer-scale scanning electron microscopy stereo photogrammetry.

    Science.gov (United States)

    Villarrubia, J S; Tondare, V N; Vladár, A E

    2016-01-01

    The combination of scanning electron microscopy for high spatial resolution, images from multiple angles to provide 3D information, and commercially available stereo photogrammetry software for 3D reconstruction offers promise for nanometer-scale dimensional metrology in 3D. A method is described to test 3D photogrammetry software by the use of virtual samples-mathematical samples from which simulated images are made for use as inputs to the software under test. The virtual sample is constructed by wrapping a rough skin with any desired power spectral density around a smooth near-trapezoidal line with rounded top corners. Reconstruction is performed with images simulated from different angular viewpoints. The software's reconstructed 3D model is then compared to the known geometry of the virtual sample. Three commercial photogrammetry software packages were tested. Two of them produced results for line height and width that were within close to 1 nm of the correct values. All of the packages exhibited some difficulty in reconstructing details of the surface roughness.

  20. The standardised copy of pentagons test

    Directory of Open Access Journals (Sweden)

    Terzoglou Vassiliki A

    2011-04-01

    Full Text Available Abstract Background The 'double-diamond copy' task is a simple paper and pencil test part of the Bender-Gestalt Test and the Mini Mental State Examination (MMSE. Although it is a widely used test, its method of scoring is crude and its psychometric properties are not adequately known. The aim of the present study was to develop a sensitive and reliable method of administration and scoring. Methods The study sample included 93 normal control subjects (53 women and 40 men aged 35.87 ± 12.62 and 127 patients suffering from schizophrenia (54 women and 73 men aged 34.07 ± 9.83. Results The scoring method was based on the frequencies of responses of healthy controls and proved to be relatively reliable with Cronbach's α equal to 0.61, test-retest correlation coefficient equal to 0.41 and inter-rater reliability equal to 0.52. The factor analysis produced two indices and six subscales of the Standardised Copy of Pentagons Test (SCPT. The total score as well as most of the individual items and subscales distinguished between controls and patients. The discriminant function correctly classified 63.44% of controls and 75.59% of patients. Discussion The SCPT seems to be a satisfactory, reliable and valid instrument, which is easy to administer, suitable for use in non-organic psychiatric patients and demands minimal time. Further research is necessary to test its psychometric properties and its usefulness and applications as a neuropsychological test.

  1. Do medical French students know how to properly score a mini mental state examination?

    Science.gov (United States)

    Hernandorena, Intza; Chauvelier, Sophie; Vidal, Jean-Sébastien; Piccoli, Matthieu; Coulon, Joséphine; Hugonot-Diener, Laurence; Rigaud, Anne-Sophie; Hanon, Olivier; Duron, Emmanuelle

    2017-06-01

    The mini mental state examination (MMSE) is a validated tool to assess global cognitive function. Training is required before scoring. Inaccurate scoring can lead to inappropriate medical decisions. In France, MMSE is usually scored by medical students. To assess if medical French students know how to properly score a mini mental state examination. Two « physician-patient » role playings performed by 2 specialized physicians, were performed in front of University Paris V medical students. Role playing A: Scoring of a MMSE according to a script containing five tricks; Role playing B: Find the 5 errors committed in a pre-filled MMSE form, according to the second script. One hundred and five students (64.4% of women, 49.5% in fifth medical school year) anonymously participated. Eighty percent of students had already scored a MMSE and 40% had been previously trained to MMSE scoring. Forty five percent of students previously scored an MMSE, without previously being trained. In test A, 16% of students did not commit any errors, 45.7% one error and 38.1% two errors. In test B, the proportion of students who provided 0, 1, 2, 3, 4 and 5 good answers was 3.3%, 29.7%, 29.7%, 25.3%, 7.7% and 4.4% respectively. No association between medical school year, previous training to MMSE scoring and performances at both tests were found. French students do not properly score MMSE. MMSE scoring is not enough or accurately taught (by specialists). The university will provide on line the tests and a short filmed teaching course performed by neuropsychological specialists.

  2. Comparison of the Abbott RealTime High Risk HPV test and the Roche cobas 4800 HPV test using urine samples.

    Science.gov (United States)

    Lim, Myong Cheol; Lee, Do-Hoon; Hwang, Sang-Hyun; Hwang, Na Rae; Lee, Bomyee; Shin, Hye Young; Jun, Jae Kwan; Yoo, Chong Woo; Lee, Dong Ock; Seo, Sang-Soo; Park, Sang-Yoon; Joo, Jungnam

    2017-05-01

    Human papillomavirus (HPV) testing based on cervical samples is important for use in cervical cancer screening. However, cervical sampling is invasive. Therefore, non-invasive methods for detecting HPV, such as urine samples, are needed. For HPV detection in urine samples, two real-time PCR (RQ-PCR) tests, Roche cobas 4800 test (Roche_HPV; Roche Molecular Diagnostics) and Abbott RealTime High Risk HPV test (Abbott_HPV; Abbott Laboratories) were compared to standard cervical samples. The performance of Roche_HPV and Abbott_HPV for HPV detection was evaluated at the National Cancer Center using 100 paired cervical and urine samples. The tests were also compared using urine samples stored at various temperatures and for a range of durations. The overall agreement between the Roche_HPV and Abbott_HPV tests using urine samples for any hrHPV type was substantial (86.0% with a kappa value of 0.7173), and that for HPV 16/18 was nearly perfect (99.0% with a kappa value of 0.9668). The relative sensitivities (based on cervical samples) for HPV 16/18 detection using Roche_HPV and Abbott_HPV with urine samples were 79.2% (95% CI; 57.9-92.9%) and 81.8% (95% CI; 59.7-94.8%), respectively. When the cut-off C T value for Abbott_HPV was extended to 40 for urine samples, the relative sensitivity of Abbott_HPV increased to 91.7% from 81.8% for HPV16/18 detection and to 87.0% from 68.5% for other hrHPV detection. The specificity was not affected by the change in the C T threshold. Roche_HPV and Abbott_HPV showed high concordance. However, HPV DNA detection using urine samples was inferior to HPV DNA detection using cervical samples. Interestingly, when the cut-off C T value was set to 40, Abbott_HPV using urine samples showed high sensitivity and specificity, comparable to those obtained using cervical samples. Fully automated DNA extraction and detection systems, such as Roche_HPV and Abbott_HPV, could reduce the variability in HPV detection and accelerate the standardization of HPV

  3. GalaxyDock BP2 score: a hybrid scoring function for accurate protein-ligand docking

    Science.gov (United States)

    Baek, Minkyung; Shin, Woong-Hee; Chung, Hwan Won; Seok, Chaok

    2017-07-01

    Protein-ligand docking is a useful tool for providing atomic-level understanding of protein functions in nature and design principles for artificial ligands or proteins with desired properties. The ability to identify the true binding pose of a ligand to a target protein among numerous possible candidate poses is an essential requirement for successful protein-ligand docking. Many previously developed docking scoring functions were trained to reproduce experimental binding affinities and were also used for scoring binding poses. However, in this study, we developed a new docking scoring function, called GalaxyDock BP2 Score, by directly training the scoring power of binding poses. This function is a hybrid of physics-based, empirical, and knowledge-based score terms that are balanced to strengthen the advantages of each component. The performance of the new scoring function exhibits significant improvement over existing scoring functions in decoy pose discrimination tests. In addition, when the score is used with the GalaxyDock2 protein-ligand docking program, it outperformed other state-of-the-art docking programs in docking tests on the Astex diverse set, the Cross2009 benchmark set, and the Astex non-native set. GalaxyDock BP2 Score and GalaxyDock2 with this score are freely available at http://galaxy.seoklab.org/softwares/galaxydock.html.

  4. A Fault Sample Simulation Approach for Virtual Testability Demonstration Test

    Institute of Scientific and Technical Information of China (English)

    ZHANG Yong; QIU Jing; LIU Guanjun; YANG Peng

    2012-01-01

    Virtual testability demonstration test has many advantages,such as low cost,high efficiency,low risk and few restrictions.It brings new requirements to the fault sample generation.A fault sample simulation approach for virtual testability demonstration test based on stochastic process theory is proposed.First,the similarities and differences of fault sample generation between physical testability demonstration test and virtual testability demonstration test are discussed.Second,it is pointed out that the fault occurrence process subject to perfect repair is renewal process.Third,the interarrival time distribution function of the next fault event is given.Steps and flowcharts of fault sample generation are introduced.The number of faults and their occurrence time are obtained by statistical simulation.Finally,experiments are carried out on a stable tracking platform.Because a variety of types of life distributions and maintenance modes are considered and some assumptions are removed,the sample size and structure of fault sample simulation results are more similar to the actual results and more reasonable.The proposed method can effectively guide the fault injection in virtual testability demonstration test.

  5. Reliable change indices and standardized regression-based change score norms for evaluating neuropsychological change in children with epilepsy.

    Science.gov (United States)

    Busch, Robyn M; Lineweaver, Tara T; Ferguson, Lisa; Haut, Jennifer S

    2015-06-01

    Reliable change indices (RCIs) and standardized regression-based (SRB) change score norms permit evaluation of meaningful changes in test scores following treatment interventions, like epilepsy surgery, while accounting for test-retest reliability, practice effects, score fluctuations due to error, and relevant clinical and demographic factors. Although these methods are frequently used to assess cognitive change after epilepsy surgery in adults, they have not been widely applied to examine cognitive change in children with epilepsy. The goal of the current study was to develop RCIs and SRB change score norms for use in children with epilepsy. Sixty-three children with epilepsy (age range: 6-16; M=10.19, SD=2.58) underwent comprehensive neuropsychological evaluations at two time points an average of 12 months apart. Practice effect-adjusted RCIs and SRB change score norms were calculated for all cognitive measures in the battery. Practice effects were quite variable across the neuropsychological measures, with the greatest differences observed among older children, particularly on the Children's Memory Scale and Wisconsin Card Sorting Test. There was also notable variability in test-retest reliabilities across measures in the battery, with coefficients ranging from 0.14 to 0.92. Reliable change indices and SRB change score norms for use in assessing meaningful cognitive change in children following epilepsy surgery are provided for measures with reliability coefficients above 0.50. This is the first study to provide RCIs and SRB change score norms for a comprehensive neuropsychological battery based on a large sample of children with epilepsy. Tables to aid in evaluating cognitive changes in children who have undergone epilepsy surgery are provided for clinical use. An Excel sheet to perform all relevant calculations is also available to interested clinicians or researchers. Copyright © 2015 Elsevier Inc. All rights reserved.

  6. How Do Age and Tooth Loss Affect Oral Health Impacts and Quality of Life?A Study Comparing Two State Samples of Gujarat and Rajasthan

    Directory of Open Access Journals (Sweden)

    A. Mathur

    2012-01-01

    Full Text Available Objective: Age and tooth loss are expected to have a complex relationship with oral health-related quality of life. So the purpose of this study was to explain the impact of age and tooth loss on oral health-related quality of life using the short form 14-item oral health impact profile (OHIP-14 among two population samples of Gujarat and Rajasthan.Materials and Methods: A cross-sectional questionnaire-based survey was conducted among 1441 subjects collected from two major cities of Gujarat and Rajasthan. Both questionnaire approaches using OHIP-14 scale and clinical examination were conducted in accordance with WHO criteria using type III procedure on the same day. Chi square test, ANOVA and stepwise multiple regression analysis were applied using SPSS software version 15.0.Results: With the increase of age, OHIP mean score in both states increased, but that among Rajasthan state was higher, depicting poor oral health. Whereas, in the remaining 23-27 number of teeth both states showed higher OHIP mean, however again the score was much higher among Rajasthan subjects showing worse oral hygiene. Hence, overall all mean OHIP score for Gujarat was lower indicating good oral health; whereas, that among Rajasthan was higher indicating poor oral health-related quality of life.Conclusion: Both age and tooth loss are associated with each other, but they have an independent effect on the oral health-related quality of life. Thus, all studied populations with complete natural dentition showed good oral health-related quality of life.

  7. Testing of candidate non-lethal sampling methods for detection of Renibacterium salmoninarum in juvenile Chinook salmon Oncorhynchus tshawytscha

    Science.gov (United States)

    Elliott, Diane G.; McKibben, Constance L.; Conway, Carla M.; Purcell, Maureen K.; Chase, Dorothy M.; Applegate, Lynn M.

    2015-01-01

    Non-lethal pathogen testing can be a useful tool for fish disease research and management. Our research objectives were to determine if (1) fin clips, gill snips, surface mucus scrapings, blood draws, or kidney biopsies could be obtained non-lethally from 3 to 15 g Chinook salmon Oncorhynchus tshawytscha, (2) non-lethal samples could accurately discriminate between fish exposed to the bacterial kidney disease agent Renibacterium salmoninarum and non-exposed fish, and (3) non-lethal samples could serve as proxies for lethal kidney samples to assess infection intensity. Blood draws and kidney biopsies caused ≥5% post-sampling mortality (Objective 1) and may be appropriate only for larger fish, but the other sample types were non-lethal. Sampling was performed over 21 wk following R. salmoninarum immersion challenge of fish from 2 stocks (Objectives 2 and 3), and nested PCR (nPCR) and real-time quantitative PCR (qPCR) results from candidate non-lethal samples were compared with kidney tissue analysis by nPCR, qPCR, bacteriological culture, enzyme-linked immunosorbent assay (ELISA), fluorescent antibody test (FAT) and histopathology/immunohistochemistry. R. salmoninarum was detected by PCR in >50% of fin, gill, and mucus samples from challenged fish. Mucus qPCR was the only non-lethal assay exhibiting both diagnostic sensitivity and specificity estimates >90% for distinguishing between R. salmoninarum-exposed and non-exposed fish and was the best candidate for use as an alternative to lethal kidney sample testing. Mucus qPCR R. salmoninarum quantity estimates reflected changes in kidney bacterial load estimates, as evidenced by significant positive correlations with kidney R. salmoninaruminfection intensity scores at all sample times and in both fish stocks, and were not significantly impacted by environmentalR. salmoninarum concentrations.

  8. Male-female differences in Scoliosis Research Society-30 scores in adolescent idiopathic scoliosis.

    Science.gov (United States)

    Roberts, David W; Savage, Jason W; Schwartz, Daniel G; Carreon, Leah Y; Sucato, Daniel J; Sanders, James O; Richards, Benjamin Stephens; Lenke, Lawrence G; Emans, John B; Parent, Stefan; Sarwark, John F

    2011-01-01

    Longitudinal cohort study. To compare functional outcomes between male and female patients before and after surgery for adolescent idiopathic scoliosis (AIS). There is no clear consensus in the existing literature with respect to sex differences in functional outcomes in the surgical treatment of AIS. A prospective, consecutive, multicenter database of patients who underwent surgical correction for adolescent idiopathic scoliosis was analyzed retrospectively. All patients completed Scoliosis Research Society-30 (SRS-30) questionnaires before and 2 years after surgery. Patients with previous spine surgery were excluded. Data were collected for sex, age, Risser grade, previous bracing history, maximum preoperative Cobb angle, curve correction at 2 years, and SRS-30 domain scores. Paired sample t tests were used to compare preoperative and postoperative scores within each sex. Independent sample t tests were used to compare scores between sexes. A P value of Self-image/appearance had the greatest relative improvement. Males had better self-image/appearance scores preoperatively, better pain scores at 2 years, and better mental health and total scores both preoperatively and at 2 years. Both males and females were similarly satisfied with surgery. Males treated with surgery for AIS report better preoperative self-image, less postoperative pain, and better mental health than females. These differences may be clinically significant. For both males and females, the most beneficial effect of surgery is improved self-image/appearance. Overall, the benefits of surgery for AIS are similar for both sexes.

  9. Shrinkage-based diagonal Hotelling’s tests for high-dimensional small sample size data

    KAUST Repository

    Dong, Kai

    2015-09-16

    DNA sequencing techniques bring novel tools and also statistical challenges to genetic research. In addition to detecting differentially expressed genes, testing the significance of gene sets or pathway analysis has been recognized as an equally important problem. Owing to the “large pp small nn” paradigm, the traditional Hotelling’s T2T2 test suffers from the singularity problem and therefore is not valid in this setting. In this paper, we propose a shrinkage-based diagonal Hotelling’s test for both one-sample and two-sample cases. We also suggest several different ways to derive the approximate null distribution under different scenarios of pp and nn for our proposed shrinkage-based test. Simulation studies show that the proposed method performs comparably to existing competitors when nn is moderate or large, but it is better when nn is small. In addition, we analyze four gene expression data sets and they demonstrate the advantage of our proposed shrinkage-based diagonal Hotelling’s test.

  10. Shrinkage-based diagonal Hotelling’s tests for high-dimensional small sample size data

    KAUST Repository

    Dong, Kai; Pang, Herbert; Tong, Tiejun; Genton, Marc G.

    2015-01-01

    DNA sequencing techniques bring novel tools and also statistical challenges to genetic research. In addition to detecting differentially expressed genes, testing the significance of gene sets or pathway analysis has been recognized as an equally important problem. Owing to the “large pp small nn” paradigm, the traditional Hotelling’s T2T2 test suffers from the singularity problem and therefore is not valid in this setting. In this paper, we propose a shrinkage-based diagonal Hotelling’s test for both one-sample and two-sample cases. We also suggest several different ways to derive the approximate null distribution under different scenarios of pp and nn for our proposed shrinkage-based test. Simulation studies show that the proposed method performs comparably to existing competitors when nn is moderate or large, but it is better when nn is small. In addition, we analyze four gene expression data sets and they demonstrate the advantage of our proposed shrinkage-based diagonal Hotelling’s test.

  11. Utilizing the Six Realms of Meaning in Improving Campus Standardized Test Scores through Team Teaching and Strategic Planning

    Science.gov (United States)

    Stevenson, Rosnisha D.; Kritsonis, William Allan

    2009-01-01

    This article will seek to utilize Dr. William Allan Kritsonis' book "Ways of Knowing Through the Realms of Meaning" (2007) as a framework to improve a campus's standardized test scores, more specifically, their TAKS (Texas Assessment of Knowledge and Skills) scores. Many campuses have an improvement plan, also known as a Campus…

  12. Integrating GIS in the Middle School Curriculum: Impacts on Diverse Students' Standardized Test Scores

    Science.gov (United States)

    Goldstein, Donna; Alibrandi, Marsha

    2013-01-01

    This case study conducted with 1,425 middle school students in Palm Beach County, Florida, included a treatment group receiving GIS instruction (256) and a control group without GIS instruction (1,169). Quantitative analyses on standardized test scores indicated that inclusion of GIS in middle school curriculum had a significant effect on student…

  13. Virginia tech freshman class becoming more competitive; Rise in grades and test scores noted

    OpenAIRE

    Virginia Tech News

    2004-01-01

    Admission to Virginia Tech continues to become more competitive as applicants report higher grade point averages and test scores than previous years. The incoming class of 4,975 students has an average grade point average (GPA) of 3.68 and SAT 1203, up from 3.60 GPA and 1197 SAT in 2003.

  14. Sample Results From The Extraction, Scrub, And Strip Test For The Blended NGS Solvent

    Energy Technology Data Exchange (ETDEWEB)

    Washington, A. L. II [Savannah River Site (SRS), Aiken, SC (United States). Savannah River National Lab. (SRNL); Peters, T. B. [Savannah River Site (SRS), Aiken, SC (United States). Savannah River National Lab. (SRNL)

    2014-03-03

    This report summarizes the results of the extraction, scrub, and strip testing for the September 2013 sampling of the Next Generation Solvent (NGS) Blended solvent from the Modular Caustic Side-Solvent Extraction Unit (MCU) Solvent Hold Tank. MCU is in the process of transitioning from the BOBCalixC6 solvent to the NGS Blend solvent. As part of that transition, MCU has intentionally created a blended solvent to be processed using the Salt Batch program. This sample represents the first sample received from that blended solvent. There were two ESS tests performed where NGS blended solvent performance was assessed using either the Tank 21 material utilized in the Salt Batch 7 analyses or a simulant waste material used in the V-5/V-10 contactor testing. This report tabulates the temperature corrected cesium distribution, or DCs values, step recovery percentage, and actual temperatures recorded during the experiment. This report also identifies the sample receipt date, preparation method, and analysis performed in the accumulation of the listed values. The calculated extraction DCs values using the Tank 21H material and simulant are 59.4 and 53.8, respectively. The DCs values for two scrub and three strip processes for the Tank 21 material are 4.58, 2.91, 0.00184, 0.0252, and 0.00575, respectively. The D-values for two scrub and three strip processes for the simulant are 3.47, 2.18, 0.00468, 0.00057, and 0.00572, respectively. These values are similar to previous measurements of Salt Batch 7 feed with lab-prepared blended solvent. These numbers are considered compatible to allow simulant testing to be completed in place of actual waste due to the limited availability of feed material.

  15. Sample Results From The Extraction, Scrub, And Strip Test For The Blended NGS Solvent

    International Nuclear Information System (INIS)

    Washington, A. L. II; Peters, T. B.

    2014-01-01

    This report summarizes the results of the extraction, scrub, and strip testing for the September 2013 sampling of the Next Generation Solvent (NGS) Blended solvent from the Modular Caustic Side-Solvent Extraction Unit (MCU) Solvent Hold Tank. MCU is in the process of transitioning from the BOBCalixC6 solvent to the NGS Blend solvent. As part of that transition, MCU has intentionally created a blended solvent to be processed using the Salt Batch program. This sample represents the first sample received from that blended solvent. There were two ESS tests performed where NGS blended solvent performance was assessed using either the Tank 21 material utilized in the Salt Batch 7 analyses or a simulant waste material used in the V-5/V-10 contactor testing. This report tabulates the temperature corrected cesium distribution, or DCs values, step recovery percentage, and actual temperatures recorded during the experiment. This report also identifies the sample receipt date, preparation method, and analysis performed in the accumulation of the listed values. The calculated extraction DCs values using the Tank 21H material and simulant are 59.4 and 53.8, respectively. The DCs values for two scrub and three strip processes for the Tank 21 material are 4.58, 2.91, 0.00184, 0.0252, and 0.00575, respectively. The D-values for two scrub and three strip processes for the simulant are 3.47, 2.18, 0.00468, 0.00057, and 0.00572, respectively. These values are similar to previous measurements of Salt Batch 7 feed with lab-prepared blended solvent. These numbers are considered compatible to allow simulant testing to be completed in place of actual waste due to the limited availability of feed material

  16. Data Quality Objectives For Selecting Waste Samples To Test The Fluid Bed Steam Reformer Test

    International Nuclear Information System (INIS)

    Banning, D.L.

    2010-01-01

    This document describes the data quality objectives to select archived samples located at the 222-S Laboratory for Fluid Bed Steam Reformer testing. The type, quantity and quality of the data required to select the samples for Fluid Bed Steam Reformer testing are discussed. In order to maximize the efficiency and minimize the time to treat Hanford tank waste in the Waste Treatment and Immobilization Plant, additional treatment processes may be required. One of the potential treatment processes is the fluid bed steam reformer (FBSR). A determination of the adequacy of the FBSR process to treat Hanford tank waste is required. The initial step in determining the adequacy of the FBSR process is to select archived waste samples from the 222-S Laboratory that will be used to test the FBSR process. Analyses of the selected samples will be required to confirm the samples meet the testing criteria.

  17. The Health Professions Admission Test (HPAT) score and leaving certificate results can independently predict academic performance in medical school: do we need both tests?

    LENUS (Irish Health Repository)

    Halpenny, D

    2010-11-01

    A recent study raised concerns regarding the ability of the health professions admission test (HPAT) Ireland to improve the selection process in Irish medical schools. We aimed to establish whether performance in a mock HPAT correlated with academic success in medicine. A modified HPAT examination and a questionnaire were administered to a group of doctors and medical students. There was a significant correlation between HPAT score and college results (r2: 0.314, P = 0.018, Spearman Rank) and between leaving cert score and college results (r2: 0.306, P = 0.049, Spearman Rank). There was no correlation between leaving cert points score and HPAT score. There was no difference in HPAT score across a number of other variables including gender, age and medical speciality. Our results suggest that both the HPAT Ireland and the leaving certificate examination could act as independent predictors of academic achievement in medicine.

  18. Genotoxicity assessment of water sampled from R-11 reservoir by means of allium test

    Energy Technology Data Exchange (ETDEWEB)

    Bukatich, E.; Pryakhin, E. [Urals Research Center for Radiation Medicine (Russian Federation); Geraskin, S. [Russian Institute of Agricultural Radiology and Agroecology (Russian Federation)

    2014-07-01

    slides of root tips meristem were dyed with aceto-orcein. Approximately 150 ana-telophases were scored for each root. 20-40 roots were analyzed for each water sample. In total 3000 - 6000 ana-telophases for each water sample were analyzed. Chromosome aberrations in ana-telophases (chromatid and chromosomal bridges and fragments), mitotic abnormalities (multipolar mitosis and laggards) were scored. The data analysis was arranged using R statistics. Aberration frequency in water samples from the natural control reservoir (0.46 ± 0.12%) exceeded insignificantly the frequency of aberrations in distilled (0.15 ± 0.08%) and bottled waters (0.33 ± 0.08%). Average frequency of aberrant cells in root meristem of onion germinated in water samples from R-11 reservoir (1.36 ± 0.24%) was about 3 times higher compared to control ones. Mitotic activity in root meristem was slightly inhibited in bulbs germinated in R-11 sample, but this effect was statistically insignificant. There was no difference in types of aberrations among all water samples but only in the frequency of abnormalities. So genotoxicity assessment of water sampled from R-11 reservoir by means of allium test shows the presence of genotoxic factor in water from the reservoir. Document available in abstract form only. (authors)

  19. Transforming Biology Assessment with Machine Learning: Automated Scoring of Written Evolutionary Explanations

    Science.gov (United States)

    Nehm, Ross H.; Ha, Minsu; Mayfield, Elijah

    2012-02-01

    This study explored the use of machine learning to automatically evaluate the accuracy of students' written explanations of evolutionary change. Performance of the Summarization Integrated Development Environment (SIDE) program was compared to human expert scoring using a corpus of 2,260 evolutionary explanations written by 565 undergraduate students in response to two different evolution instruments (the EGALT-F and EGALT-P) that contained prompts that differed in various surface features (such as species and traits). We tested human-SIDE scoring correspondence under a series of different training and testing conditions, using Kappa inter-rater agreement values of greater than 0.80 as a performance benchmark. In addition, we examined the effects of response length on scoring success; that is, whether SIDE scoring models functioned with comparable success on short and long responses. We found that SIDE performance was most effective when scoring models were built and tested at the individual item level and that performance degraded when suites of items or entire instruments were used to build and test scoring models. Overall, SIDE was found to be a powerful and cost-effective tool for assessing student knowledge and performance in a complex science domain.

  20. Scoring System Improvements to Three Leadership Predictors

    National Research Council Canada - National Science Library

    Dela

    1997-01-01

    .... The modified scoring systems were evaluated by rescoring responses randomly selected from the sample which had been scored according to the scoring systems originally developed for the leadership research...

  1. Science Teacher Efficacy and Outcome Expectancy as Predictors of Students' End-of-Instruction (EOI) Biology I Test Scores

    Science.gov (United States)

    Angle, Julie; Moseley, Christine

    2009-01-01

    The purpose of this study was to compare teacher efficacy beliefs of secondary Biology I teachers whose students' mean scores on the statewide End-of-Instruction (EOI) Biology I test met or exceeded the state academic proficiency level (Proficient Group) to teacher efficacy beliefs of secondary Biology I teachers whose students' mean scores on the…

  2. Dose Uniformity of Scored and Unscored Tablets: Application of the FDA Tablet Scoring Guidance for Industry.

    Science.gov (United States)

    Ciavarella, Anthony B; Khan, Mansoor A; Gupta, Abhay; Faustino, Patrick J

    This U.S. Food and Drug Administration (FDA) laboratory study examines the impact of tablet splitting, the effect of tablet splitters, and the presence of a tablet score on the dose uniformity of two model drugs. Whole tablets were purchased from five manufacturers for amlodipine and six for gabapentin. Two splitters were used for each drug product, and the gabapentin tablets were also split by hand. Whole and split amlodipine tablets were tested for content uniformity following the general chapter of the United States Pharmacopeia (USP) Uniformity of Dosage Units , which is a requirement of the new FDA Guidance for Industry on tablet scoring. The USP weight variation method was used for gabapentin split tablets based on the recommendation of the guidance. All whole tablets met the USP acceptance criteria for the Uniformity of Dosage Units. Variation in whole tablet content ranged from 0.5 to 2.1 standard deviation (SD) of the percent label claim. Splitting the unscored amlodipine tablets resulted in a significant increase in dose variability of 6.5-25.4 SD when compared to whole tablets. Split tablets from all amlodipine drug products did not meet the USP acceptance criteria for content uniformity. Variation in the weight for gabapentin split tablets was greater than the whole tablets, ranging from 1.3 to 9.3 SD. All fully scored gabapentin products met the USP acceptance criteria for weight variation. Size, shape, and the presence or absence of a tablet score can affect the content uniformity and weight variation of amlodipine and gabapentin tablets. Tablet splitting produced higher variability. Differences in dose variability and fragmentation were observed between tablet splitters and hand splitting. These results are consistent with the FDA's concerns that tablet splitting can have an effect on the amount of drug present in a split tablet and available for absorption. Tablet splitting has become a very common practice in the United States and throughout the

  3. Acceptance test report for core sample trucks 3 and 4

    International Nuclear Information System (INIS)

    Corbett, J.E.

    1996-01-01

    The purpose of this Acceptance Test Report is to provide documentation for the acceptance testing of the rotary mode core sample trucks 3 and 4, designated as HO-68K-4600 and HO-68K-4647, respectively. This report conforms to the guidelines established in WHC-IP-1026, ''Engineering Practice Guidelines,'' Appendix M, ''Acceptance Test Procedures and Reports.'' Rotary mode core sample trucks 3 and 4 were based upon the design of the second core sample truck (HO-68K-4345) which was constructed to implement rotary mode sampling of the waste tanks at Hanford. Successful completion of acceptance testing on June 30, 1995 verified that all design requirements were met. This report is divided into four sections, beginning with general information. Acceptance testing was performed on trucks 3 and 4 during the months of March through June, 1995. All testing was performed at the ''Rock Slinger'' test site in the 200 West area. The sequence of testing was determined by equipment availability, and the initial revision of the Acceptance Test Procedure (ATP) was used for both trucks. Testing was directed by ICF-KH, with the support of WHC Characterization Equipment Engineering and Characterization Project Operations. Testing was completed per the ATP without discrepancies or deviations, except as noted

  4. A Danish diabetes risk score for targeted screening: the Inter99 study.

    Science.gov (United States)

    Glümer, Charlotte; Carstensen, Bendix; Sandbaek, Annelli; Lauritzen, Torsten; Jørgensen, Torben; Borch-Johnsen, Knut

    2004-03-01

    To develop a simple self-administered questionnaire identifying individuals with undiagnosed diabetes with a sensitivity of 75% and minimizing the high-risk group needing subsequent testing. A population-based sample (Inter99 study) of 6,784 individuals aged 30-60 years completed a questionnaire on diabetes-related symptoms and risk factors. The participants underwent an oral glucose tolerance test. The risk score was derived from the first half and validated on the second half of the study population. External validation was performed based on the Danish Anglo-Danish-Dutch Study of Intensive Treatment in People with Screen Detected Diabetes in Primary Care (ADDITION) pilot study. The risk score was developed by stepwise backward multiple logistic regression. The final risk score included age, sex, BMI, known hypertension, physical activity at leisure time, and family history of diabetes, items independently and significantly (Pscreening strategy for type 2 diabetes, decreasing the numbers of subsequent tests and thereby possibly minimizing the economical and personal costs of the screening strategy.

  5. A Kolmogorov-Smirnov Based Test for Comparing the Predictive Accuracy of Two Sets of Forecasts

    Directory of Open Access Journals (Sweden)

    Hossein Hassani

    2015-08-01

    Full Text Available This paper introduces a complement statistical test for distinguishing between the predictive accuracy of two sets of forecasts. We propose a non-parametric test founded upon the principles of the Kolmogorov-Smirnov (KS test, referred to as the KS Predictive Accuracy (KSPA test. The KSPA test is able to serve two distinct purposes. Initially, the test seeks to determine whether there exists a statistically significant difference between the distribution of forecast errors, and secondly it exploits the principles of stochastic dominance to determine whether the forecasts with the lower error also reports a stochastically smaller error than forecasts from a competing model, and thereby enables distinguishing between the predictive accuracy of forecasts. We perform a simulation study for the size and power of the proposed test and report the results for different noise distributions, sample sizes and forecasting horizons. The simulation results indicate that the KSPA test is correctly sized, and robust in the face of varying forecasting horizons and sample sizes along with significant accuracy gains reported especially in the case of small sample sizes. Real world applications are also considered to illustrate the applicability of the proposed KSPA test in practice.

  6. Tests on CANDU fuel elements sheath samples

    International Nuclear Information System (INIS)

    Ionescu, S.; Uta, O.; Mincu, M.; Prisecaru, I.

    2016-01-01

    This work is a study of the behavior of CANDU fuel elements after irradiation. The tests are made on ring samples taken from fuel cladding in INR Pitesti. This paper presents the results of examinations performed in the Post Irradiation Examination Laboratory. By metallographic and ceramographic examination we determinate that the hydride precipitates are orientated parallel to the cladding surface. A content of hydrogen of about 120 ppm was estimated. After the preliminary tests, ring samples were cut from the fuel rod, and were subject of tensile test on an INSTRON 5569 model machine in order to evaluate the changes of their mechanical properties as consequence of irradiation. Scanning electron microscopy was performed on a microscope model TESCAN MIRA II LMU CS with Schottky FE emitter and variable pressure. The analysis shows that the central zone has deeper dimples, whereas on the outer zone, the dimples are tilted and smaller. (authors)

  7. WISC-III subtests of similarities, vocabulary and comprehension: objective or subjective scoring? / Subtestes semelhanças, vocabulário e compreensão do WISC-III: pontuação objetiva ou subjetiva?

    Directory of Open Access Journals (Sweden)

    Vera Lucia Marques de Figueiredo

    2010-01-01

    Full Text Available In all psychological tests, scoring should be of concern for examiners because the accuracy of results depends, at some extent, on the quality of the correction. This work aims to examine the correction, by different psychologists, of the scores for the Wechsler Intelligence Scale for Children (WISC-III subtests of Similarities, Vocabulary and Comprehension since these are the subtests where examiner's subjectivity seemingly most influences scoring. Forty two psychologists from different states in Brazil participated in this study. They corrected the answers of six test protocols randomly selected from a standardization sample for the Brazilian context. Taking as reference the total scores, the Vocabulary subtest showed greater variability in score, followed by the Comprehension one. Considering the total number of items tested in each subtest, Similarities had the highest agreement among raters. The results showed that all the three subtests involve subjectivity on behalf of the examiner to score the answers. Continuing in this study, we also aim to determine test reliability based on interrater agreement.

  8. Analysis of fingerprint samples, testing various conditions, for forensic DNA identification.

    Science.gov (United States)

    Ostojic, Lana; Wurmbach, Elisa

    2017-01-01

    Fingerprints can be of tremendous value for forensic biology, since they can be collected from a wide variety of evident types, such as handles of weapons, tools collected in criminal cases, and objects with no apparent staining. DNA obtained from fingerprints varies greatly in quality and quantity, which ultimately affects the quality of the resulting STR profiles. Additional difficulties can arise when fingerprint samples show mixed STR profiles due to the handling of multiple persons. After applying a tested protocol for sample collection (swabbing with 5% Triton X-100), DNA extraction (using an enzyme that works at elevated temperatures), and PCR amplification (AmpFlSTR® Identifiler® using 31cycles) extensive analysis was performed to better understand the challenges inherent to fingerprint samples, with the ultimate goal of developing valuable profiles (≥50% complete). The impact of time on deposited fingerprints was investigated, revealing that while the quality of profiles deteriorated, full STR profiles could still be obtained from samples after 40days of storage at room temperature. By comparing the STR profiles from fingerprints of the dominant versus the non-dominant hand, we found a slightly better quality from the non-dominant hand, which was not always significant. Substrates seem to have greater effects on fingerprints. Tests on glass, plastic, paper and metal (US Quarter dollar, made of Cu and Ni), common substrates in offices and homes, showed best results for glass, followed by plastic and paper, while almost no profiles were obtained from a Quarter dollar. Important for forensic casework, we also assessed three-person mixtures of touched fingerprint samples. Unlike routinely used approaches for sampling evidence, the surface of an object (bottle) was sectioned into six equal parts and separate samples were taken from each section. The samples were processed separately for DNA extraction and STR amplification. The results included a few single

  9. Associations between cadmium exposure and neurocognitive test scores in a cross-sectional study of US adults.

    Science.gov (United States)

    Ciesielski, Timothy; Bellinger, David C; Schwartz, Joel; Hauser, Russ; Wright, Robert O

    2013-02-05

    Low-level environmental cadmium exposure and neurotoxicity has not been well studied in adults. Our goal was to evaluate associations between neurocognitive exam scores and a biomarker of cumulative cadmium exposure among adults in the Third National Health and Nutrition Examination Survey (NHANES III). NHANES III is a nationally representative cross-sectional survey of the U.S. population conducted between 1988 and 1994. We analyzed data from a subset of participants, age 20-59, who participated in a computer-based neurocognitive evaluation. There were four outcome measures: the Simple Reaction Time Test (SRTT: visual motor speed), the Symbol Digit Substitution Test (SDST: attention/perception), the Serial Digit Learning Test (SDLT) trials-to-criterion, and the SDLT total-error-score (SDLT-tests: learning recall/short-term memory). We fit multivariable-adjusted models to estimate associations between urinary cadmium concentrations and test scores. 5662 participants underwent neurocognitive screening, and 5572 (98%) of these had a urinary cadmium level available. Prior to multivariable-adjustment, higher urinary cadmium concentration was associated with worse performance in each of the 4 outcomes. After multivariable-adjustment most of these relationships were not significant, and age was the most influential variable in reducing the association magnitudes. However among never-smokers with no known occupational cadmium exposure the relationship between urinary cadmium and SDST score (attention/perception) was significant: a 1 μg/L increase in urinary cadmium corresponded to a 1.93% (95%CI: 0.05, 3.81) decrement in performance. These results suggest that higher cumulative cadmium exposure in adults may be related to subtly decreased performance in tasks requiring attention and perception, particularly among those adults whose cadmium exposure is primarily though diet (no smoking or work based cadmium exposure). This association was observed among exposure levels

  10. COMPARISON BETWEEN WOOD DRYING DEFECT SCORES: SPECIMEN TESTING X ANALYSIS OF KILN-DRIED BOARDS

    Directory of Open Access Journals (Sweden)

    Djeison Cesar Batista

    2015-04-01

    Full Text Available It is important to develop drying technologies for Eucalyptus grandis lumber, which is one of the most planted species of this genus in Brazil and plays an important role as raw material for the wood industry. The general aim of this work was to assess the conventional kiln drying of juvenile wood of three clones of Eucalyptus grandis. The specific aims were to compare the behavior between: i drying defects indicated by tests with wood specimens and conventional kiln-dried boards; and ii physical properties and the drying quality. Five 11-year-old trees of each clone were felled, and only flatsawn boards of the first log were used. Basic density and total shrinkage were determined, and the drying test with wood specimens at 100 °C was carried out. Kiln drying of boards was performed, and initial and final moisture content, moisture gradient in thickness, drying stresses and drying defects were assessed. The defect scoring method was used to verify the behavior between the defects detected by specimen testing and the defects detected in kiln-dried boards. As main results, the drying schedule was too severe for the wood, resulting in a high level of boards with defects. The behavior between the defects in the drying test with specimens and the defects of kiln-dried boards was different, there was no correspondence, according to the defect scoring method.

  11. A novel approach for small sample size family-based association studies: sequential tests.

    Science.gov (United States)

    Ilk, Ozlem; Rajabli, Farid; Dungul, Dilay Ciglidag; Ozdag, Hilal; Ilk, Hakki Gokhan

    2011-08-01

    In this paper, we propose a sequential probability ratio test (SPRT) to overcome the problem of limited samples in studies related to complex genetic diseases. The results of this novel approach are compared with the ones obtained from the traditional transmission disequilibrium test (TDT) on simulated data. Although TDT classifies single-nucleotide polymorphisms (SNPs) to only two groups (SNPs associated with the disease and the others), SPRT has the flexibility of assigning SNPs to a third group, that is, those for which we do not have enough evidence and should keep sampling. It is shown that SPRT results in smaller ratios of false positives and negatives, as well as better accuracy and sensitivity values for classifying SNPs when compared with TDT. By using SPRT, data with small sample size become usable for an accurate association analysis.

  12. Comparison of middle latency responses in presbycusis patients with two different speech recognition scores.

    Science.gov (United States)

    Kirkim, Gunay; Madanoglu, Nevma; Akdas, Ferda; Serbetcioglu, M Bulent

    2007-12-01

    The purpose of this study is to evaluate whether the middle latency responses (MLR) can be used for an objective differentiation of patients with presbycusis having relatively good (Group I) and relatively poor speech recognition scores (Group II). All the participants of these groups had high frequency down-sloping hearing loss with an average of 26-60 dB HL. Data were collected from two described study groups and a control group, using pure tone audiometry, monosyllabic phonetically balanced word and synthetic sentence identification, as well as MLR. The study groups were compared with the control group. When patients in Group I were compared with the control group, only ipsilateral Na latency of middle latency evoked response was statistically significant in the right ear whereas ipsilateral Na latency in the right ear, ipsilateral and contralateral Na latency in the left ear of the patients in Group II were statistically significant. Thus, as an objective complementary tool for the evaluation of the speech perception ability of the patients with presbycusis, Na latency of MLR may be used in combination with the speech discrimination tests.

  13. The effect of an intervention program on functional movement screen test scores in mixed martial arts athletes.

    Science.gov (United States)

    Bodden, Jamie G; Needham, Robert A; Chockalingam, Nachiappan

    2015-01-01

    This study assessed the basic fundamental movements of mixed martial arts (MMA) athletes using the functional movement screen (FMS) assessment and determined if an intervention program was successful at improving results. Participants were placed into 1 of the 2 groups: intervention and control groups. The intervention group was required to complete a corrective exercise program 4 times per week, and all participants were asked to continue their usual MMA training routine. A mid-intervention FMS test was included to examine if successful results were noticed sooner than the 8-week period. Results highlighted differences in FMS test scores between the control group and intervention group (p = 0.006). Post hoc testing revealed a significant increase in the FMS score of the intervention group between weeks 0 and 8 (p = 0.00) and weeks 0 and 4 (p = 0.00) and no significant increase between weeks 4 and 8 (p = 1.00). A χ analysis revealed that the intervention group participants were more likely to have an FMS score >14 than participants in the control group at week 4 (χ = 7.29, p < 0.01) and week 8 (χ = 5.2, p ≤ 0.05). Finally, a greater number of participants in the intervention group were free from asymmetry at week 4 and week 8 compared with the initial test period. The results of the study suggested that a 4-week intervention program was sufficient at improving FMS scores. Most if not all, the movements covered on the FMS relate to many aspects of MMA training. The knowledge that the FMS can identify movement dysfunctions and, furthermore, the fact that the issues can be improved through a standardized intervention program could be advantageous to MMA coaches, thus, providing the opportunity to adapt and implement new additions to training programs.

  14. Walk Score® and Transit Score® and Walking in the Multi-Ethnic Study of Atherosclerosis

    Science.gov (United States)

    Hirsch, Jana A.; Moore, Kari A.; Evenson, Kelly R.; Rodriguez, Daniel A; Diez Roux, Ana V.

    2013-01-01

    Background Walk Score® and Transit Score® are open-source measures of the neighborhood built environment to support walking (“walkability”) and access to transportation. Purpose To investigate associations of Street Smart Walk Score and Transit Score with self-reported transport and leisure walking using data from a large multi-city and diverse population-based sample of adults. Methods Data from a sample of 4552 residents of Baltimore MD; Chicago IL; Forsyth County NC; Los Angeles CA; New York NY; and St. Paul MN from the Multi-Ethnic Study of Atherosclerosis (2010–2012) were linked to Walk Score and Transit Score (collected in 2012). Logistic and linear regression models estimated ORs of not walking and mean differences in minutes walked, respectively, associated with continuous and categoric Walk Score and Transit Score. All analyses were conducted in 2012. Results After adjustment for site, key sociodemographic, and health variables, a higher Walk Score was associated with lower odds of not walking for transport and more minutes/week of transport walking. Compared to those in a “walker’s paradise,” lower categories of Walk Score were associated with a linear increase in odds of not transport walking and a decline in minutes of leisure walking. An increase in Transit Score was associated with lower odds of not transport walking or leisure walking, and additional minutes/week of leisure walking. Conclusions Walk Score and Transit Score appear to be useful as measures of walkability in analyses of neighborhood effects. PMID:23867022

  15. Examination of the Five Comparable Component Scores of the Diet Quality Indexes HEI-2005 and RC-DQI Using a Nationally Representative Sample of 2–18 Year Old Children: NHANES 2003–2006

    Directory of Open Access Journals (Sweden)

    Sibylle Kranz

    2013-01-01

    Full Text Available Obesity has been associated with low diet quality and the suboptimal intake of food groups and nutrients. Two composite diet quality measurement tools are appropriate for Americans 2–18 years old: the Healthy Eating Index (HEI 2005 and the Revised Children’s Diet Quality Index (RC-DQI. The five components included in both indexes are fruits, vegetables, total grains, whole grains, and milk/dairy. Component scores ranged from 0 to 5 or 0 to 10 points with lower scores indicating suboptimal intake. To allow direct comparisons, one component was rescaled by dividing it by 2; then, all components ranged from 0 to 5 points. The aim of this study was to directly compare the scoring results of these five components using dietary data from a nationally representative sample of children (NHANES 2003–2006, . Correlation coefficients within and between indexes showed less internal consistency in the HEI; age- and ethnic-group stratified analyses indicated higher sensitivity of the RC-DQI. HEI scoring was likely to dichotomize the population into two groups (those with 0 and those with 5 points, while RC-DQI scores resulted in a larger distribution of scores. The scoring scheme of diet quality indexes for children results in great variation of the outcomes, and researchers must be aware of those effects.

  16. MALDI-TOF mass spectrometry and high-consequence bacteria: safety and stability of biothreat bacterial sample testing in clinical diagnostic laboratories.

    Science.gov (United States)

    Tracz, Dobryan M; Tober, Ashley D; Antonation, Kym S; Corbett, Cindi R

    2018-03-01

    We considered the application of MALDI-TOF mass spectrometry for BSL-3 bacterial diagnostics, with a focus on the biosafety of live-culture direct-colony testing and the stability of stored extracts. Biosafety level 2 (BSL-2) bacterial species were used as surrogates for BSL-3 high-consequence pathogens in all live-culture MALDI-TOF experiments. Viable BSL-2 bacteria were isolated from MALDI-TOF mass spectrometry target plates after 'direct-colony' and 'on-plate' extraction testing, suggesting that the matrix chemicals alone cannot be considered sufficient to inactivate bacterial culture and spores in all samples. Sampling of the instrument interior after direct-colony analysis did not recover viable organisms, suggesting that any potential risks to the laboratory technician are associated with preparation of the MALDI-TOF target plate before or after testing. Secondly, a long-term stability study (3 years) of stored MALDI-TOF extracts showed that match scores can decrease below the threshold for reliable species identification (<1.7), which has implications for proficiency test panel item storage and distribution.

  17. Effects of Public Preschool Expenditures on the Test Scores of 4 Graders: Evidence from TIMSS.

    Science.gov (United States)

    Waldfogel, Jane; Zhai, Fuhua

    2008-02-01

    This study examines the effects of public preschool expenditures on the math and science scores of 4(th) graders, holding constant child, family, and school characteristics, other relevant social expenditures, and country and year effects, in seven Organization for Economic Co-operation and Development (OECD) countries -- Australia, Japan, Netherlands, New Zealand, Norway, U.K., and U.S -- using data from the 1995 and 2003 Trends in International Mathematics and Science Study (TIMSS). Our results indicate that there are small but significant positive effects of public preschool expenditures on the math and science scores of 4(th) graders and preschool expenditures reduce the risk of children scoring at the low level of proficiency. We also find some evidence that children from low-resource homes and homes where the test language is not always spoken may tend to gain more from increased public preschool expenditures than other children,.

  18. Neurocognitive function in HIV-infected patients: comparison of two methods to define impairment.

    Directory of Open Access Journals (Sweden)

    Alejandro Arenas-Pinto

    Full Text Available To compare two definitions of neurocognitive impairment (NCI in a large clinical trial of effectively-treated HIV-infected adults at baseline.Hopkins Verbal Learning test-Revised (HVLT-R, Colour Trail (CTT and Grooved Pegboard (GPT tests were applied exploring five cognitive domains. Raw scores were transformed into Z-scores and NCI defined as summary NPZ-5 score one standard deviation below the mean of the normative dataset (i.e. <-1SD or Z-scores <-1SD in at least two individual domains (categorical scale. Principal component analysis (PCA was performed to explore the contribution of individual tests to the total variance.Mean NPZ-5 score was -0.72 (SD 0.98 and 178/548 (32% participants had NPZ-5 scores <-1SD. When impairment was defined as <-1SD in at least two individual tests, 283 (52% patients were impaired. Strong correlations between the two components of the HVLT-R test (learning/recall (r = 0.73, and the CTT and (attention/executive functioning (r = 0.66 were observed. PCA showed a clustering with three components accounting for 88% of the total variance. When patients who scored <-1SD only in two correlated tests were considered as not impaired, prevalence of NCI was 43%. When correlated test scores were averaged, 36% of participants had NPZ-3 scores <-1SD and 32% underperformed in at least two individual tests.Controlling for differential contribution of individual test-scores on the overall performance and the level of correlation between components of the test battery used appear to be important when testing cognitive function. These two factors are likely to affect both summary scores and categorical scales in defining cognitive impairment.EUDRACT: 2007-006448-23 and ISRCTN04857074.

  19. The score statistic of the LD-lod analysis: detecting linkage adaptive to linkage disequilibrium.

    Science.gov (United States)

    Huang, J; Jiang, Y

    2001-01-01

    We study the properties of a modified lod score method for testing linkage that incorporates linkage disequilibrium (LD-lod). By examination of its score statistic, we show that the LD-lod score method adaptively combines two sources of information: (a) the IBD sharing score which is informative for linkage regardless of the existence of LD and (b) the contrast between allele-specific IBD sharing scores which is informative for linkage only in the presence of LD. We also consider the connection between the LD-lod score method and the transmission-disequilibrium test (TDT) for triad data and the mean test for affected sib pair (ASP) data. We show that, for triad data, the recessive LD-lod test is asymptotically equivalent to the TDT; and for ASP data, it is an adaptive combination of the TDT and the ASP mean test. We demonstrate that the LD-lod score method has relatively good statistical efficiency in comparison with the ASP mean test and the TDT for a broad range of LD and the genetic models considered in this report. Therefore, the LD-lod score method is an interesting approach for detecting linkage when the extent of LD is unknown, such as in a genome-wide screen with a dense set of genetic markers. Copyright 2001 S. Karger AG, Basel

  20. Renal dysfunction in liver cirrhosis and its correlation with Child-Pugh score and MELD score

    Science.gov (United States)

    Siregar, G. A.; Gurning, M.

    2018-03-01

    Renal dysfunction (RD) is a serious and common complication in a patient with liver cirrhosis. It provides a poor prognosis. The aim of our study was to evaluate the renal function in liver cirrhosis, also to determine the correlation with the graduation of liver disease assessed by Child-Pugh Score (CPS) and MELD score. This was a cross-sectional study included patients with liver cirrhosis admitted to Adam Malik Hospital Medan in June - August 2016. We divided them into two groups as not having renal dysfunction (serum creatinine SPSS 22.0 was used. Statistical methods used: Chi-square, Fisher exact, one way ANOVA, Kruskal Wallis test and Pearson coefficient of correlation. The level of significance was p<0.05. 55 patients with presented with renal dysfunction were 16 (29.1 %). There was statistically significant inverse correlation between GFR and CPS (r = -0.308), GFR and MELD score (r = -0.278). There was a statistically significant correlation between creatinine and MELD score (r = 0.359), creatinine and CPS (r = 0.382). The increase of the degree of liver damage is related to the increase of renal dysfunction.

  1. Testing and sampling of deep brine aquifers in the Palo Duro Basin, West Texas

    International Nuclear Information System (INIS)

    Deyling, M.A.

    1984-01-01

    The US Department of Energy is investigating the Palo Duro Basin of West Texas along with locations in Nevada, Washington, Utah, Mississippi and Louisiana as potential sites for storage of high-level nuclear waste. Ten wells have been drilled to depths between 3000 and 8300 feet. Testing and sampling of deep test zones requires advance planning and analysis of what must be obtained from the well. Various alternatives are available depending on data needs. In this particular instance, both hydrologic and geochemical data were required. The methods chosen were field proven methods used in the oil field industry for many years. Short term testing has included conventional oil-field-type drill stem tests and drill stem equipment with surface pressure readout. Long term testing has consisted of a series of production and recovery tests. Fluid sampling was performed in two stages. The first was at the well head under an imposed pressure of several hundred psi. The second fluid samples were collected downhole at the production zone under pressures close to ambient pressure. The geochemical data and hydrologic data can be used as independent checks on each other in many cases. Test results from the well along with examination of recovered core provided maximum data for each well. 5 references, 8 figures

  2. A comparison between modified Alvarado score and RIPASA score in the diagnosis of acute appendicitis.

    Science.gov (United States)

    Singla, Anand; Singla, Satpaul; Singh, Mohinder; Singla, Deeksha

    2016-12-01

    Acute appendicitis is a common but elusive surgical condition and remains a diagnostic dilemma. It has many clinical mimickers and diagnosis is primarily made on clinical grounds, leading to the evolution of clinical scoring systems for pin pointing the right diagnosis. The modified Alvarado and RIPASA scoring systems are two important scoring systems, for diagnosis of acute appendicitis. We prospectively compared the two scoring systems for diagnosing acute appendicitis in 50 patients presenting with right iliac fossa pain. The RIPASA score correctly classified 88 % of patients with histologically confirmed acute appendicitis compared with 48.0 % with modified Alvarado score, indicating that RIPASA score is more superior to Modified Alvarado score in our clinical settings.

  3. The Validity of Graduate Management Admission Test Scores: A Summary of Studies Conducted from 1997 to 2004

    Science.gov (United States)

    Talento-Miller, Eileen; Rudner, Lawrence M.

    2008-01-01

    The validity of Graduate Management Admission Test (GMAT) scores is examined by summarizing 273 studies conducted between 1997 and 2004. Each of the studies was conducted through the Validity Study Service of the test sponsor and contained identical variables and statistical methods. Validity coefficients from each of the studies were corrected…

  4. Evaluation of ELISA screening test for detecting aflatoxin in biogenic dust samples

    Energy Technology Data Exchange (ETDEWEB)

    Durant, J.T.

    1996-05-01

    Aflatoxin is a carcinogenic chemical that is sometimes produced when agricultural commodities are infested by the fungi Aspergillus flavus and A. Parasiticus. Aflatoxin has been found to be present in air samples taken around persons handling materials likely to be contaminated. The purpose of this investigation was to demonstrate the feasibility of using an Enzyme Linked Immunosorbent Assay (ELISA) test kit that was developed to screen for aflatoxin in bulk agricultural commodities, to an air sample. Samples were taken from two environments likely to be contaminated with aflatoxin, a dairy farm feed mixing operation and a peanut bagging operation. The dust collected from these environments was considered to be biogenic, in that it originated primarily from biological materials.

  5. Sample preparation guidelines for two-dimensional electrophoresis.

    Science.gov (United States)

    Posch, Anton

    2014-12-01

    Sample preparation is one of the key technologies for successful two-dimensional electrophoresis (2DE). Due to the great diversity of protein sample types and sources, no single sample preparation method works with all proteins; for any sample the optimum procedure must be determined empirically. This review is meant to provide a broad overview of the most important principles in sample preparation in order to avoid a multitude of possible pitfalls. Sample preparation protocols from the expert in the field were screened and evaluated. On the basis of these protocols and my own comprehensive practical experience important guidelines are given in this review. The presented guidelines will facilitate straightforward protocol development for researchers new to gel-based proteomics. In addition the available choices are rationalized in order to successfully prepare a protein sample for 2DE separations. The strategies described here are not limited to 2DE and can also be applied to other protein separation techniques.

  6. Effect of two yoga-based relaxation techniques on memory scores and state anxiety

    Directory of Open Access Journals (Sweden)

    Telles Shirley

    2009-08-01

    Full Text Available Abstract Background A yoga practice involving cycles of yoga postures and supine rest (called cyclic meditation was previously shown to improve performance in attention tasks more than relaxation in the corpse posture (shavasana. This was ascribed to reduced anxiety, though this was not assessed. Methods In fifty-seven male volunteers (group average age ± S.D., 26.6 ± 4.5 years the immediate effect of two yoga relaxation techniques was studied on memory and state anxiety. All participants were assessed before and after (i Cyclic meditation (CM practiced for 22:30 minutes on one day and (ii an equal duration of Supine rest (SR or the corpse posture (shavasana, on another day. Sections of the Wechsler memory scale (WMS were used to assess; (i attention and concentration (digit span forward and backward, and (ii associate learning. State anxiety was assessed using Spielberger's State-Trait Anxiety Inventory (STAI. Results There was a significant improvement in the scores of all sections of the WMS studied after both CM and SR, but, the magnitude of change was more after CM compared to after SR. The state anxiety scores decreased after both CM and SR, with a greater magnitude of decrease after CM. There was no correlation between percentage change in memory scores and state anxiety for either session. Conclusion A cyclical combination of yoga postures and supine rest in CM improved memory scores immediately after the practice and decreased state anxiety more than rest in a classical yoga relaxation posture (shavasana.

  7. Results from Testing of Two Rotary Percussive Drilling Systems

    Science.gov (United States)

    Kriechbaum, Kristopher; Brown, Kyle; Cady, Ian; von der Heydt, Max; Klein, Kerry; Kulczycki, Eric; Okon, Avi

    2010-01-01

    The developmental test program for the MSL (Mars Science Laboratory) rotary percussive drill examined the e ect of various drill input parameters on the drill pene- tration rate. Some of the input parameters tested were drill angle with respect to gravity and percussive impact energy. The suite of rocks tested ranged from a high strength basalt to soft Kaolinite clay. We developed a hole start routine to reduce high sideloads from bit walk. The ongoing development test program for the IMSAH (Integrated Mars Sample Acquisition and Handling) rotary percussive corer uses many of the same rocks as the MSL suite. An additional performance parameter is core integrity. The MSL development test drill and the IMSAH test drill use similar hardware to provide rotation and percussion. However, the MSL test drill uses external stabilizers, while the IMSAH test drill does not have external stabilization. In addition the IMSAH drill is a core drill, while the MSL drill uses a solid powdering bit. Results from the testing of these two related drilling systems is examined.

  8. Report of testing and sampling of municipal supply well PM-4

    International Nuclear Information System (INIS)

    Koch, Richard J.; Longmire, Patrick; Rogers, David B.; Mullen, Ken

    1999-01-01

    During drilling of regional aquifer characterization borehole R-25, located in the western part of Los Alamos National Laboratory (LANL) at Technical Area (TA) 16, groundwater samples were collected from perched zones of saturation and the regional aquifer that contained elevated levels of high explosive (HE) compounds. One of the nearest Los Alamos County municipal supply wells potentially located down gradient from borehole R-25 is PM-4, located on Mesita del Buey at the west end of TA-54. During the winter of 1998 and 1999 the pump in PM-4 had been removed from the well for scheduled maintenance by the Los Alamos County Public Utilities Department (PUD). Because the pump was removed from PM-4, the opportunity existed to enter the well to (1) perform tests to determine where within the regional aquifer groundwater entered the well and (2) collect groundwater samples from the producing zones for analyses to determine if HE contaminants were present in discrete zones within the regional aquifer. The report of the activities that were performed during March 1999 for the testing and sampling of municipal supply well PM-4 is provided. The report provides a description of the field activities associated with the two phases of the project, including (1) the results of the static and dynamic spinner log surveys, and (2) a description of the sampling activities and the field-measured groundwater quality parameters that were obtained during sampling activities. This report also provides the analytical results of the groundwater samples and a brief discussion of the results of the project

  9. Standardization of a Volumetric Displacement Measurement for Two-Body Abrasion Scratch Test Data Analysis

    Science.gov (United States)

    Street, K. W. Jr.; Kobrick, R. L.; Klaus, D. M.

    2011-01-01

    A limitation has been identified in the existing test standards used for making controlled, two-body abrasion scratch measurements based solely on the width of the resultant score on the surface of the material. A new, more robust method is proposed for analyzing a surface scratch that takes into account the full three-dimensional profile of the displaced material. To accomplish this, a set of four volume- displacement metrics was systematically defined by normalizing the overall surface profile to denote statistically the area of relevance, termed the Zone of Interaction. From this baseline, depth of the trough and height of the plowed material are factored into the overall deformation assessment. Proof-of-concept data were collected and analyzed to demonstrate the performance of this proposed methodology. This technique takes advantage of advanced imaging capabilities that allow resolution of the scratched surface to be quantified in greater detail than was previously achievable. When reviewing existing data analysis techniques for conducting two-body abrasive scratch tests, it was found that the ASTM International Standard G 171 specified a generic metric based only on visually determined scratch width as a way to compare abraded materials. A limitation to this method was identified in that the scratch width is based on optical surface measurements, manually defined by approximating the boundaries, but does not consider the three-dimensional volume of material that was displaced. With large, potentially irregular deformations occurring on softer materials, it becomes unclear where to systematically determine the scratch width. Specifically, surface scratches on different samples may look the same from a top view, resulting in an identical scratch width measurement, but may vary in actual penetration depth and/or plowing deformation. Therefore, two different scratch profiles would be measured as having identical abrasion properties, although they differ

  10. Results from tests of TFL Hydragard sampling loop

    International Nuclear Information System (INIS)

    Steimke, J.L.

    1995-03-01

    When the Defense Waste Processing Facility (DWPF) is operational, processed radioactive sludge will be transferred in batches to the Slurry Mix Evaporator (SME), where glass frit will be added and the contents concentrated by boiling. Batches of the slurry mixture are transferred from the SME to the Melter Feed Tank (MFT). Hydragard reg-sign sampling systems are used on the SME and the MFT for collecting slurry samples in vials for chemical analysis. An accurate replica of the Hydragard sampling system was built and tested in the thermal Fluids Laboratory (TFL) to determine the hydragard accuracy. It was determined that the original Hydragard valve frequently drew a non-representative sample stream through the sample vial that ranged from frit enriched to frit depleted. The Hydragard valve was modified by moving the plunger and its seat backwards so that the outer surface of the plunger was flush with the inside diameter of the transfer line when the valve was open. The slurry flowing through the vial accurately represented the composition of the slurry in the reservoir for two types of slurries, different dilution factors, a range of transfer flows and a range of vial flows. It was then found that the 15 ml of slurry left in the vial when the Hydragard valve was closed, which is what will be analyzed at DWPF, had a lower ratio of frit to sludge as characterized by the lithium to iron ratio than the slurry flowing through it. The reason for these differences is not understood at this time but it is recommended that additional experimentation be performed with the TFL Hydragard loop to determine the cause

  11. Group SkSP-R sampling plan for accelerated life tests

    Indian Academy of Sciences (India)

    Muhammad Aslam

    2017-09-15

    Sep 15, 2017 ... SkSP-R sampling; life test; Weibull distribution; producer's risk; ... designed a sampling plan under a time-truncated life test .... adjusted using an acceleration factor. ... where P is the probability of lot acceptance for a single.

  12. Sway Area and Velocity Correlated With MobileMat Balance Error Scoring System (BESS) Scores.

    Science.gov (United States)

    Caccese, Jaclyn B; Buckley, Thomas A; Kaminski, Thomas W

    2016-08-01

    The Balance Error Scoring System (BESS) is often used for sport-related concussion balance assessment. However, moderate intratester and intertester reliability may cause low initial sensitivity, suggesting that a more objective balance assessment method is needed. The MobileMat BESS was designed for objective BESS scoring, but the outcome measures must be validated with reliable balance measures. Thus, the purpose of this investigation was to compare MobileMat BESS scores to linear and nonlinear measures of balance. Eighty-eight healthy collegiate student-athletes (age: 20.0 ± 1.4 y, height: 177.7 ± 10.7 cm, mass: 74.8 ± 13.7 kg) completed the MobileMat BESS. MobileMat BESS scores were compared with 95% area, sway velocity, approximate entropy, and sample entropy. MobileMat BESS scores were significantly correlated with 95% area for single-leg (r = .332) and tandem firm (r = .474), and double-leg foam (r = .660); and with sway velocity for single-leg (r = .406) and tandem firm (r = .601), and double-leg (r = .575) and single-leg foam (r = .434). MobileMat BESS scores were not correlated with approximate or sample entropy. MobileMat BESS scores were low to moderately correlated with linear measures, suggesting the ability to identify changes in the center of mass-center of pressure relationship, but not higher-order processing associated with nonlinear measures. These results suggest that the MobileMat BESS may be a clinically-useful tool that provides objective linear balance measures.

  13. Speech-discrimination scores modeled as a binomial variable.

    Science.gov (United States)

    Thornton, A R; Raffin, M J

    1978-09-01

    Many studies have reported variability data for tests of speech discrimination, and the disparate results of these studies have not been given a simple explanation. Arguments over the relative merits of 25- vs 50-word tests have ignored the basic mathematical properties inherent in the use of percentage scores. The present study models performance on clinical tests of speech discrimination as a binomial variable. A binomial model was developed, and some of its characteristics were tested against data from 4120 scores obtained on the CID Auditory Test W-22. A table for determining significant deviations between scores was generated and compared to observed differences in half-list scores for the W-22 tests. Good agreement was found between predicted and observed values. Implications of the binomial characteristics of speech-discrimination scores are discussed.

  14. Actigraphy-based sleep estimation in adolescents and adults: a comparison with polysomnography using two scoring algorithms

    Directory of Open Access Journals (Sweden)

    Quante M

    2018-01-01

    Full Text Available Mirja Quante,1–3 Emily R Kaplan,2 Michael Cailler,2 Michael Rueschman,2 Rui Wang,2–5 Jia Weng,2 Elsie M Taveras,3,5,6 Susan Redline2,3,7 1Department of Neonatology, University of Tuebingen, Tuebingen, Germany; 2Division of Sleep and Circadian Disorders, Departments of Medicine and Neurology, Brigham and Women’s Hospital, Boston, MA, USA; 3Harvard Medical School, Boston, MA, USA; 4Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA; 5Department of Population Medicine, Harvard Medical School and The Harvard Pilgrim Health Care Institute, Boston, MA, USA; 6Division of General Academic Pediatrics, Department of Pediatrics, MassGeneral Hospital for Children, Boston, MA, USA; 7Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA Objectives: Actigraphy is widely used to estimate sleep–wake time, despite limited information regarding the comparability of different devices and algorithms. We compared estimates of sleep–wake times determined by two wrist actigraphs (GT3X+ versus Actiwatch Spectrum [AWS] to in-home polysomnography (PSG, using two algorithms (Sadeh and Cole–Kripke for the GT3X+ recordings.Subjects and methods: Participants included a sample of 35 healthy volunteers (13 school children and 22 adults, 46% male from Boston, MA, USA. Twenty-two adults wore the GT3X+ and AWS simultaneously for at least five consecutive days and nights. In addition, actigraphy and PSG were concurrently measured in 12 of these adults and another 13 children over a single night. We used intraclass correlation coefficients (ICCs, epoch-by-epoch comparisons, paired t-tests, and Bland–Altman plots to determine the level of agreement between actigraphy and PSG, and differences between devices and algorithms.Results: Each actigraph showed comparable accuracy (0.81–0.86 for sleep–wake estimation compared to PSG. When analyzing data from the GT3X+, the Cole–Kripke algorithm was more

  15. The Effect of English Language on Multiple Choice Question Scores of Thai Medical Students.

    Science.gov (United States)

    Phisalprapa, Pochamana; Muangkaew, Wayuda; Assanasen, Jintana; Kunavisarut, Tada; Thongngarm, Torpong; Ruchutrakool, Theera; Kobwanthanakun, Surapon; Dejsomritrutai, Wanchai

    2016-04-01

    Universities in Thailand are preparing for Thailand's integration into the ASEAN Economic Community (AEC) by increasing the number of tests in English language. English language is not the native language of Thailand Differences in English language proficiency may affect scores among test-takers, even when subject knowledge among test-takers is comparable and may falsely represent the knowledge level of the test-taker. To study the impact of English language multiple choice test questions on test scores of medical students. The final examination of fourth-year medical students completing internal medicine rotation contains 120 multiple choice questions (MCQ). The languages used on the test are Thai and English at a ratio of 3:1. Individual scores of tests taken in both languages were collected and the effect of English language on MCQ was analyzed Individual MCQ scores were then compared with individual student English language proficiency and student grade point average (GPA). Two hundred ninety five fourth-year medical students were enrolled. The mean percentage of MCQ scores in Thai and English were significantly different (65.0 ± 8.4 and 56.5 ± 12.4, respectively, p English was fair (Spearman's correlation coefficient = 0.41, p English than in Thai language. Students were classified into six grade categories (A, B+, B, C+, C, and D+), which cumulatively measured total internal medicine rotation performance score plus final examination score. MCQ scores from Thai language examination were more closely correlated with total course grades than were the scores from English language examination (Spearman's correlation coefficient = 0.73 (p English proficiency score was very high, at 3.71 ± 0.35 from a total of 4.00. Mean student GPA was 3.40 ± 0.33 from a possible 4.00. English language MCQ examination scores were more highly associated with GPA than with English language proficiency. The use of English language multiple choice question test may decrease scores

  16. Comparison of formula and number-right scoring in undergraduate medical training: a Rasch model analysis.

    Science.gov (United States)

    Cecilio-Fernandes, Dario; Medema, Harro; Collares, Carlos Fernando; Schuwirth, Lambert; Cohen-Schotanus, Janke; Tio, René A

    2017-11-09

    Progress testing is an assessment tool used to periodically assess all students at the end-of-curriculum level. Because students cannot know everything, it is important that they recognize their lack of knowledge. For that reason, the formula-scoring method has usually been used. However, where partial knowledge needs to be taken into account, the number-right scoring method is used. Research comparing both methods has yielded conflicting results. As far as we know, in all these studies, Classical Test Theory or Generalizability Theory was used to analyze the data. In contrast to these studies, we will explore the use of the Rasch model to compare both methods. A 2 × 2 crossover design was used in a study where 298 students from four medical schools participated. A sample of 200 previously used questions from the progress tests was selected. The data were analyzed using the Rasch model, which provides fit parameters, reliability coefficients, and response option analysis. The fit parameters were in the optimal interval ranging from 0.50 to 1.50, and the means were around 1.00. The person and item reliability coefficients were higher in the number-right condition than in the formula-scoring condition. The response option analysis showed that the majority of dysfunctional items emerged in the formula-scoring condition. The findings of this study support the use of number-right scoring over formula scoring. Rasch model analyses showed that tests with number-right scoring have better psychometric properties than formula scoring. However, choosing the appropriate scoring method should depend not only on psychometric properties but also on self-directed test-taking strategies and metacognitive skills.

  17. Quality standards for sample collection in coagulation testing.

    Science.gov (United States)

    Lippi, Giuseppe; Salvagno, Gian Luca; Montagnana, Martina; Lima-Oliveira, Gabriel; Guidi, Gian Cesare; Favaloro, Emmanuel J

    2012-09-01

    Preanalytical activities, especially those directly connected with blood sample collection and handling, are the most vulnerable steps throughout the testing process. The receipt of unsuitable samples is commonplace in laboratory practice and represents a serious problem, given the reliability of test results can be adversely compromised following analysis of these specimens. The basic criteria for an appropriate and safe venipuncture are nearly identical to those used for collecting blood for clinical chemistry and immunochemistry testing, and entail proper patient identification, use of the correct technique, as well as appropriate devices and needles. There are, however, some peculiar aspects, which are deemed to be particularly critical when collecting quality specimens for clot-based tests, and these require clearer recognition. These include prevention of prolonged venous stasis, collection of nonhemolyzed specimens, order of draw, and appropriate filling and mixing of the primary collection tubes. All of these important preanalytical issues are discussed in this article, and evidence-based suggestions as well as recommendations on how to obtain a high-quality sample for coagulation testing are also illustrated. We have also performed an investigation aimed to identify variation of test results due to underfilling of primary blood tubes, and have identified a clinically significant bias in test results when tubes are drawn at less than 89% of total fill for activated partial thromboplastin time, less than 78% for fibrinogen, and less than 67% for coagulation factor VIII, whereas prothrombin time and activated protein C resistance remain relatively reliable even in tubes drawn at 67% of the nominal volume. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.

  18. Test results of the first 50 kA NbTi full size sample for ITER

    International Nuclear Information System (INIS)

    Ciazynski, D.; Zani, L.; Huber, S.; Stepanov, B.; Karlemo, B.

    2003-01-01

    Within the framework of the research studies for the International Thermonuclear Experimental Reactor (ITER) project, the first full size NbTi conductor sample was fabricated in industry and tested in the SULTAN facility (Villigen, Switzerland). This sample (PF-FSJS), which is relevant to the Poloidal Field coils of ITER, is composed of two parallel straight bars of conductor, connected at bottom through a joint designed according to the Cea twin-box concept. The two conductor legs are identical except for the use of different strands: a nickel plated NbTi strand with a pure copper matrix in one leg, and a bare NbTi strand with copper matrix and internal CuNi barrier in the other leg. The two conductors and the joint were extensively tested regarding DC (direct current) and AC (alternative current) properties. This paper reports on the tests results and analysis, stressing the differences between the two conductor legs and discussing the impact of the test results on the ITER design criteria for conductor and joint. While joint DC resistance, conductors and joint AC losses, fulfilled the ITER requirements, neither conductor could reach its current sharing temperature at relevant ITER currents, due to instabilities. Although the drop in temperature is slight for the CuNi strand cable, it is more significant for the Ni plated strand cable. (authors)

  19. The Sinonasal Outcome Test 22 score in persons without chronic rhinosinusitis

    DEFF Research Database (Denmark)

    Lange, Bibi; Thilsing, T; Baelum, J

    2016-01-01

    -67 with a mean score of 10.5 (CI: 9.1 - 11.9) and the median score was 7. Persons with allergic rhinitis and blue collar workers had a significant higher score. CONCLUSION: The median value of 7 is taken as the normal SNOT 22 score in persons without CRS and can be used as a reference in clinical settings...... and research. Allergic rhinitis and occupation affects SNOT 22 in persons without CRS. This article is protected by copyright. All rights reserved....

  20. Mineralogic and petrologic investigation of post-test core samples from the Spent Fuel Test - Climax

    International Nuclear Information System (INIS)

    Ryerson, F.J.; Beiriger, J.

    1985-02-01

    We have characterized a suite of samples taken subsequent to the end of the Spent Fuel Test - Climax by petrographic and microanalytical techniques and determined their mineral assemblage, modal properties, and mineral chemistry. The samples were obtained immediately adjacent to the canister borehole at a variety of depths and positions within the canister drift, as well as radially outward from each canister hole. This method of sampling allows variations in post-test mineralogic properties to be evaluated on the basis of (1) depth along a particular canister hole and (2) position within the canister drift, with respect to the heat and radiation sources, and with respect to the pre - test samples. In no case did we find any significant correlation between the mineralogical properties and variables listed above. In short, the Spent Fuel Test - Climax has produced no identifiable mineralogical response in the Climax quartz monzonite. 12 refs., 11 figs., 5 tabs