Ken's Notes

Molecular Test Validation Guidelines

· notes from AMP Molecular Diagnostic Assay Validation (2009) and its update (2014)

· Checklist:

· adequate number of each of the expected specimen types

· reference method comparison

· reference range and/or reportable ranges determined

· Laboratory Director signed off written procedure

· Reporting criteria established and final report form written

· Ongoing QC procedures established and documented

· UNMODIFIED FDA-approved or FDA-cleared tests (labeled “for in vitro diagnostic use” require VERIFICATION of these performance characteristics published in the manufacturer’s package insert:

· accuracy data

· precision data

· reportable range (clinical reportable range and linearity)

· linear range (for quantitatitve assays)

· reference intervals (normal values) for laboratory patient population

· MODIFIED FDA-approved tests or non-FDA cleared tests (i.e. Laboratory Developed Procedure (LDP), previously Laboratory Developed Test – LDT) require ESTABLISHMENT of the following performance characteristics:

· accuracy data

· precision data

· analytic sensitivity (lower limit of target detection)

· analytic specificity

· reportable range

· linear range (for quantitative assays)

· reference intervals (normal values) for laboratory patient population

· For some tests:

· efficiency or call rate for genotyping assays (for those assays in which a large number of samples are available)

· specimen stability

· carryover

· Assay Design:

· Define the requirements of the test (intended use, test method, expected performance characteristics as listed above)

· Review the literature to support evidence for clinical utility and clinical validity of the test

· Assess clinical indication for the test

· Define target population

· Define purpose of the test (e.g., screening, diagnosis, prognosis, monitoring)

· Choose pertinent specimen types

· Establish criteria for sample rejection

· Sample age

· Sample quantity

· Preferred anticoagulant for collection tubes

· Establish minimal acceptance performance criteria of the test (TAT, coefficient of variation of the assay)

· Consider the role of test result in patient management

· Assess technical feasibility of assay implementation (e.g., right equipment, manpower, enough samples to justify implementation of the assay)

· Perform initial optimization studies to establish assay protocol and parameters before starting the validation

· Analytical validation:

· Accuracy:

· Per the Standards for the Reporting of Diagnostic Accuracy (STARD)

· the amount of agreement between the index test (under development) and the reference standard (best available method, or method already established in the lab)

· analyze a known sample (concentration or result, or both)

· elements of accuracy that should be addressed:

· sensitivity

· specificity

· positive predictive value (PPV)

· negative predictive value (NPV)

· false-positive rate

· false-negative rate

· For sequencing assays:

· It is important to establish that the test is capable of detecting appropriate representative types of DNA changes (e.g. point mutations, deletions, insertions)

· the number of possible mutations essentially precludes the use of reference materials that cover every possible mutation

· For quantitative assays, compare results between the new method and “reference” method or method already established in the lab. Evaluation of bias between new and comparative method) can be done in one of the following ways:

· evaluate bias (difference between new, comparative method) in one of the following ways:

· check CAP limits for passing proficiency testing

· t-test to calculate statistically significant difference in the mean

· linear regression analysis:

· plot reference (x) vs. new method (y) data

· calculate linear regression statistics (ideally: slope = 1, intercept = 0, r = 0.99)

· establish criteria for accepting results (ex. > 95 % confidence interval, +/- 2 SD)

· results using new method on certified reference materials

· test specimens from the anticipated patient population

· sex, age, race, etc.

· choose appropriate data analysis techniques

· choose appropriate reference methods

· choose appropriate comparative methods

· appropriate number of specimens depends on many factors including:

· complexity of the assay

· frequency of targets/alleles in the intended use population

· established accuracy of reference methods

· whether test is FDA-approved, FDA-cleared, modified FDA-approved, or LDP

· extent and type of validation needed for a particular test is left to the discretion of the laboratory director

· FDA-approved methods, example:

· suggest 20-40 samples that span the entire reportable range for quantitative assays, and different possible genotypes for genotyping assays

· Analytical Sensitivity

· ability of a test to detect a mutation or disease when that mutation/disease is present

· lower limit of detection (the lowest concentration of analyte that the assay can detect)

· 95% LLOD (lowest concentration of analyte that the assay can detect 95% of the time)

· analyte per mL of sample

· calculate for each target singly

· measure samples a number of times under different conditions

· use titrations

· test individuals who are known to have the condition being tested

· compare to a "gold standard" or another validated method in the lab

· input range (for genotyping assays) - acceptable range within which the multiplex assay yields accurate results for all variants tested

· Analytical specificity

· ability of a test to give a normal (negative) result in specimens without the mutation or disease being tested

· ability of a test to detect the analyte without cross-reacting with other substances or genetically or biologically similar microbes

· The maximum amount of a potentially interfering substance the assay can tolerate without causing actual interference or adversely affecting the rest results should be determined

· interference studies:

· spike the specimens with interfering agents (spiked vs. unspiked)

· studies should be performed for each specimen matrix used in the assay

· RNA copurified with varying levels of DNA and vice versa

· a panel of closely-related organisms/alleles should be assessed to determine cross-reactivity

· test individuals with other conditions in the differential diagnosis, and/or known to be negative for the condition being tested, to determine false positive rate

· sources / substances that might interfere:

· poor sampling

· lack of sample stabilizer

· maternal cell contamination

· cross contamination during sample processing

· normal tissue included with diseased tissue

· bacteria

· endogenous substances (hemoglobin, cholesterol, triglycerides, medications, anticoagulants, residual sample processing, stabilization reagents)

· exogenous substances (medications, anticoagulants, residual sample processing, stabilization reagents)

· Precision

· getting the same results with repetition of the assay

· metrics:

· mean

· standard deviation

· coefficient of variation (SD/mean)

· intra-assay variation (repeatability):

· various sample concentrations (suggest at least 3 []s)

· low [] can be 2-4 x established LoD

· high [] should be close to 99th percentile of tested []s

· various patterns of variants

· inter-assay variation (reproducibility):

· different days

· different operators

· different equipment

· within run, calculate mean, SD, and CV (coefficient of variation):

· run one sample several times in one run

· between runs, calculate mean, SD, and CV (coefficient of variation):

· run one or more samples in several different runs over several days

· compare %CV to manufacturer's claim for FDA-approved assays

· evaluate the cause if higher %CV is found

· take control samples that cover different genotypes

· ex. suggest at least one homozygous mutant, one heterozygous mutant, and one homozygous WT

· run in triplicate per run for 5 days

· include representation of are alleles

· test teh ability to discriminate similar and adjacent mutations where possible

· sources of variability:

· operator (most common)

· reagent lot (2nd most common)

· instrument

· sample concentration

· sample source

· run

· time of day

· laboratory environment

· apply to entire assay as applicable, from extraction and amplification to detection

· For Quantitative Assays:

· At least 3 sample concentrations that cover the clinically important decision levels

· Low concentration replicates can be two to four times the established level of detection (LoD)

· High concentrations should be close to the 99^th percentile of tested concentrations / titers.

· Example: A single operator testing five replicates at three concentrations (low, medium, high) run for 3 days using a single lot of reagent measured on the same instrument (total of 15 replicates per concentration will be generated) (CLSI, EP15-A2). It may be necessary to conduct multiple repeatability validation studies to cover all testing variables (see above).

· Use a spreadsheet to calculate mean value, standard deviation and %CV for within-run and between-run precision, and percent of agreement between tests performed under two different conditions (confidence interval should be calculated for the observed percent agreement). For FDA approved assays, compare to the manufacturer’s claim; if higher %CV were obtained, then evaluate the cause

· Compare precision to clinically acceptable variation (e.g. for HCV quantitative assays, changes beyond 0.5 log are considered true changes and not intrinsic test variation)

· For Genotyping Assays:

· For assays that utilize melting temperature (Tm): select control samples that cover the different genotypes (e.g. to identify prothrombin 20210G>A mutation, run at least one homozygous mutant, one heterozygous and one homozygous wild type sample, respectively). Run replicates on each run over. Calculate the average Tm and SD of the Tm. The Tm values have to be within the range of the manufacturer’s claim.

· For assays that require peak size to determine number of repeats: e.g., for Fragile X by PCR, take 3 control samples of known size within the normal range (≤ 49 repeats), 3 “low” pre-mutations (50-79 repeats), and 3 “high” pre-mutations (80-200 repeats). Calculate the average size and SD. Acceptable range for Fragile X PCR +/- 3 repeats21.

· Include representation of rare alleles to ensure that their presence can be detected with precision.

· Test the ability of the system/assay to discriminate similar and adjacent mutations, where possible e.g.: BRAF V600E (c.1799T>A) and V600K(c.1798_1799GT>AA)

· Assay linearity (quantitative assays)

· series of standards, or serial dilutions of a known standard or sample

· graph known values (x) vs. measured values (y)

· calculate regression (ideally slope 1.0, intercept 0)

· Reference and Testing Material

· reference materials (RM) are used for:

· calibration of the measuring system

· assessment of a measurement procedure

· assigning values to materials

· quality control

· RM selected based on the needs of the assay, the methodology, and availability

· types of RM

· genomic DNA

· mimics patient sample in terms of complexity

· recombinant plasmids or synthesized oligonucleotides

· can control for multiple alleles in a single reaction

· examples of RM:

· human DNA

· bacterial and viral geomic DNA

· mitochondrial DNA

· synthetic DNA

· plasmids containing human DNA

· amplicons

· in vitro transcripts

· synthetic oligonucleotides

· recombinant DNA

· phage and phage protein packaged nucleic acid

· genetically modified cell lines

· Resources for such reference materials are listed in this document

· AMP maintains a collection of links to laboratory testing reference and validation materials and providers at http://www.amp.org/committees/clinical_practice/ValidationResources.cfm

· commercially available

· another laboratory

· for sequencing assays, the number of possible mutations essentially precludes the use of reference materials

· reference materials for quantitative methods:

· high, middle, and low quantitative results

· samples may be mixed to create mid-point samples

· samples may be diluted to ensure a high result is within analytical reference range

· aliqots can be stored at -20C for DNA or -70C for RNA

· calibration verification q6 mo.

· recovery study (spike samples with known amount of standard and measure)

· avoid plasma samples that were frozen and thawed > 3 times

· Clinical Validation:

· Clinical validity:

· ability of the test to detect or predict the associated disorder (phenotype)

· http://www.cdc.gov/genomics/gTesting/ACCE.htm

· Clinical Utility:

· how useful the test is in the diagnosis or treatment of patients

· Clinical Sensitivity:

· proprtion of patients with the mutation/disease who have a positive test result

· Clinical Specificity:

· proportion of patients who lack the mutation/disease who have a negative test result

· Assay optimization for LDTs:

· extraction optimization

· amplification optimization

· detection optimization

· interpretation and reporting optimization

· evaluate these variables

· "raw material" (extracted DNA or RNA)

· specimen source, storage, transport, stability, integrity

· sufficient input amounts and volumes of specimens to determine the correct dynamic range for each specimen type

· quality of DNA/RNA

· length, molecular weight, purity - optical density at 260nm/280nm, 260/230, gel electrophoresis, bioanalyzer

· quantity (concentration) of DNA/RNA

· matrix - related to specimen type

· reagents, including stores and aliquots/working solutions

· instrument calibration

· well-to-well cross-contamination (for automated nucleic acid extraction)

· Reportable range:

· Definition: The span of test result values over which the laboratory can establish or verify the accuracy of the instrument or test system measurement response

· Linear Range:

· Definition: The range where the test values are proportional to the concentration of the analyte in the sample

· Reference range:

· reference intervals (normal range):

· normal controls (patients without the condition being tested)

· if clinically indicated, consider range of individuals of both sexes, ethnicity, pregnancy status, etc.

· consider individuals with other medical conditions who will be tested

· number of samples is at the discretion of the laboratory director

· abnormal range:

· include heterozygous and homozygous, if possible

· at least 20 samples if possible, preferably more

· Other validation references to check out:

· Jennings L, Van Deerlin VM, Gulley ML. Recommended principles and practices for validating clinical molecular pathology tests. Arch Pathol Lab Med. 2009;133:743-755

· Marlowe, E.M. and Wolk, D.M. 2009. Molecular Method Verification. Diagnostic Molecular Microbiology, ASM Press, Washington, DC., in press

· AMP EGFR testing guidelines 2013

· Validation should be performed for each specimen type likely to be encountered, and testing should be reported only on validated specimen types.

· tumour enrichment proceducres should also be assessed during test validation

· specificity of ultrasensitive methods (< 1% allele fraction) must receive additional attention

· multiple negative lung cancer specimens

· multiple no-template controls

References:

· Molecular Diagnostic Assay Validation. Association for Molecular Pathology Clinical Practice Committee (Oct. 2009)

· Lindeman et al. Molecular Testing Guideline for Selection of Lung Cancer Patients for EGFR and ALK Tyrosine Kinase Inhibitors: Guideline from the College of American Pathologists, International Association for the Study of Lung Cancer, and Association of Molecular Pathology. Arch Pathol Lab Med 2013;137:828-860.