In recent years, there have been several debates about the effectiveness of screening for breast cancer as a public health policy: concerns have been raised about how often one should be screened for breast cancer, what should be the age to start regular screening, and whether one should be screened before age 50 years. This talk will explore these questions by modeling the cancer disease process and investigating the performance of screen exams in two ways: 1) a mixed model is developed to estimate the diagnostic error rates of cancer screening exams; to draw inferences about the accuracy of a screening test and to quantify the effect of explanatory variables, having accounted for heterogeneities and unobserved cancers which are inherent in population-level administrative data. A Markov chain Monte Carlo (MCMC) algorithm is described for estimating the posterior distributions of the sensitivity, specificity and prevalence when the reference standard is imperfect. 2) A variety of techniques are discussed that can estimate the sojourn time and the sensitivity of the screen using cohort data on the observed prevalence of breast cancer at successive screens and on the incidence of disease during intervals between screens. The variation of screening sensitivity and the mean sojourn time for different age groups is further investigated.
In this presentation, we will share our team's experience from the recent SSC Case Study Competition, specifically our inspirations, challenges and practical recommendations. We will present the statistical models we considered for our chosen study, and the discussion will focus on methods such as Principal Component Analysis, Segmented Regression and Multinomial Regression.
This seminar will include two parts. In the first part I will introduce the concept of 'gene sets' in the information derived from genomic, transcriptomic or proteomic experiments. I will explain the principles of gene set analysis with an emphasis on gene set over-representation and gene set enrichment analyses. There will be practical examples of the interpretation of gene set analysis results using enrichment maps and an introduction to the software tools used. In the second part, I will provide hands on practices for gene set over-representation and gene set enrichment analyses and their results' visualization using different omics data.
The SSC case studies competition is one-of-a-kind opportunity to transfer the knowledge from classroom to propose practical applications to real-world problems. Participating in the competition can be both demanding and rewarding at the same time, but most importantly, it is an invaluable learning experience. Our group participated in the case study that examined the Statistics Canada Labour Force Survey (LFS) data to explore both the short- and long-term effects of the 2008 financial crisis on youth employment in Canada. In addition to presenting the methods and results from our study, we will also be sharing our personal experiences, pointing out some challenges we've faced, and highlighting some tips for success.
Life history studies are often conducted by monitoring individuals over time and collecting information on the occurrence of certain events or conditions according to a disease process. Multistate models are most commonly used to study the life history data, conceptualized as sequences of states and transitions between these states.
There are two main questions about the design of the life history studies: How are individuals selected for the study? What and how their life history information are recorded? The sampling and observation schemes have important impact on the analysis. Prevalent cohort study is an efficient framework for studying the rate of disease progression for many chronic condition, where only individuals being in a certain state at the recruitment are included (e.g. alive with the condition/disease). If the mortality rates for individuals with and without disease are different, the probability of a person being selected for the study would depend on their disease history, hence creating a selection bias. Such a problem has been well studied in the context of survival analysis for length biased time to event data, but relatively little work has been done on more general multistate framework for modelling complex disease processes. We quantify the difference between the transition intensities in cohorts under response-dependent sampling scheme and those of the population. We examine a hierarchy of conditional likelihoods relying on different assumptions about the available data and demonstrate efficiency gains through simulation studies. The second part of this consider the complication due to incomplete observation of life history data. In particular, we focus on the situation when a composite endpoint called progression-free survival is used to assess the treatment effect in cancer trails, whereas the progression status is only observed at periodic assessment times. We examine the asymptotic and empirical estimators of the marginal (progression-free) survival functions and associated treatment effects under right endpoint imputation, a routinely adopted approach to deal with interval censoring in practice. Specifically, we explore the determinants of the asymptotic bias and highlight that there is typically a loss in power of tests for treatment effects.
Maternal exposure to environmental chemicals may cause adverse effects on child development, especially during early gestation. Studies have mostly focused on prenatal exposure to individual chemicals of interest, ignoring the fact that women are exposed to multiple chemicals on a daily basis. In this study, we investigate the patterns of maternal biomonitoring data for twenty nine chemicals by maternal characteristics from the Maternal-Infant Research on Environment Chemicals (MIREC) Study. Principal component analysis (PCA) was used to extract the chemicals that have similar patterns as well as to reduce the dimension of the dataset. Cluster analysis was subsequently implemented to categorize participants based on their socio-demographic variables, followed by hypothesis testing to determine if the mean converted concentrations of chemical substances significantly differ among women with different characteristics. Eleven components were retained which explained approximately 70% of the variance, and six main clusters of the participants were categorized. In particular, outputs showed that for pregnant women, one component is dominated by persistent organic pollutants, while another is dominated by phthalates. The results demonstrated that mixtures of chemical concentrations have a strong association with the characteristics of the participants. As a result, future studies may benefit by analyzing multiple exposures to environmental chemicals in relation to the health effects of pregnant women and children.