Molecular Test Validation Guidelines
·
notes from AMP Molecular Diagnostic Assay
Validation (2009) and its update (2014)
·
Checklist:
·
adequate number of each of the expected
specimen types
·
reference method comparison
·
reference range and/or reportable ranges
determined
·
Laboratory Director signed off written
procedure
·
Reporting criteria established and final
report form written
·
Ongoing QC procedures established and
documented
·
UNMODIFIED FDA-approved or FDA-cleared
tests (labeled “for in vitro diagnostic use” require VERIFICATION of these
performance characteristics published in the manufacturer’s package insert:
·
accuracy data
·
precision data
·
reportable range (clinical reportable
range and linearity)
·
linear range (for quantitatitve
assays)
·
reference intervals (normal values) for
laboratory patient population
·
MODIFIED FDA-approved tests or non-FDA
cleared tests (i.e. Laboratory Developed Procedure (LDP), previously Laboratory
Developed Test – LDT) require ESTABLISHMENT of the following performance
characteristics:
·
accuracy data
·
precision data
·
analytic sensitivity (lower limit of
target detection)
·
analytic specificity
·
reportable range
·
linear range (for quantitative assays)
·
reference intervals (normal values) for
laboratory patient population
·
For some tests:
·
efficiency or call rate for genotyping
assays (for those assays in which a large number of samples are available)
·
specimen stability
·
carryover
·
Assay Design:
·
Define the requirements of the test
(intended use, test method, expected performance characteristics as listed
above)
·
Review the literature to support evidence
for clinical utility and clinical validity of the test
·
Assess clinical indication for the test
·
Define target population
·
Define purpose of the test (e.g.,
screening, diagnosis, prognosis, monitoring)
·
Choose pertinent specimen types
·
Establish criteria for sample rejection
·
Sample age
·
Sample quantity
·
Preferred anticoagulant for collection
tubes
·
Establish minimal acceptance performance
criteria of the test (TAT, coefficient of variation of the assay)
·
Consider the role of test result in
patient management
·
Assess technical feasibility of assay
implementation (e.g., right equipment, manpower, enough samples to justify
implementation of the assay)
·
Perform initial optimization studies to establish assay protocol
and parameters before starting the validation
·
Analytical validation:
·
Accuracy:
·
Per the Standards for the Reporting of
Diagnostic Accuracy (STARD)
·
the amount of agreement between the
index test (under development) and the reference standard (best available
method, or method already established in the lab)
·
analyze a known sample (concentration
or result, or both)
·
elements of accuracy that should be
addressed:
·
sensitivity
·
specificity
·
positive predictive value (PPV)
·
negative predictive value (NPV)
·
false-positive rate
·
false-negative rate
·
For sequencing assays:
·
It is important to establish that the
test is capable of detecting appropriate representative types of DNA changes
(e.g. point mutations, deletions, insertions)
·
the number of possible mutations
essentially precludes the use of reference materials that cover every possible
mutation
·
For quantitative assays, compare
results between the new method and “reference” method or method already
established in the lab. Evaluation of bias between new and comparative method)
can be done in one of the following ways:
·
evaluate bias (difference between new,
comparative method) in one of the following ways:
·
check CAP limits for passing
proficiency testing
·
t-test to calculate statistically
significant difference in the mean
·
linear regression analysis:
·
plot reference (x) vs. new method (y)
data
·
calculate linear regression statistics
(ideally: slope = 1, intercept = 0, r = 0.99)
·
establish criteria for accepting
results (ex. > 95 % confidence interval, +/- 2 SD)
·
results using new method on certified
reference materials
·
test specimens from the anticipated
patient population
·
sex, age, race, etc.
·
choose appropriate data analysis
techniques
·
choose appropriate reference methods
·
choose appropriate comparative methods
·
appropriate number of specimens
depends on many factors including:
·
complexity of the assay
·
frequency of targets/alleles in the
intended use population
·
established accuracy of reference
methods
·
whether test is FDA-approved,
FDA-cleared, modified FDA-approved, or LDP
·
extent and type of validation needed
for a particular test is left to the discretion of the laboratory director
·
FDA-approved methods, example:
·
suggest 20-40 samples that span the
entire reportable range for quantitative assays, and different possible
genotypes for genotyping assays
·
Analytical Sensitivity
·
ability of a test to detect a mutation
or disease when that mutation/disease is present
·
lower limit of detection (the lowest
concentration of analyte that the assay can detect)
·
95% LLOD (lowest concentration of analyte that the assay can detect 95% of the time)
·
analyte per mL of sample
·
calculate for each target singly
·
measure samples a number of times
under different conditions
·
use titrations
·
test individuals who are known to have
the condition being tested
·
compare to a "gold standard"
or another validated method in the lab
·
input range (for genotyping assays) -
acceptable range within which the multiplex assay yields accurate results for
all variants tested
·
Analytical specificity
·
ability of a test to give a normal
(negative) result in specimens without the mutation or disease being tested
·
ability of a test to detect the analyte without cross-reacting with other substances or
genetically or biologically similar microbes
·
The maximum amount of a potentially
interfering substance the assay can tolerate without causing actual
interference or adversely affecting the rest results should be determined
·
interference studies:
·
spike the specimens with interfering
agents (spiked vs. unspiked)
·
studies should be performed for each
specimen matrix used in the assay
·
RNA copurified
with varying levels of DNA and vice versa
·
a panel of closely-related
organisms/alleles should be assessed to determine cross-reactivity
·
test individuals with other conditions
in the differential diagnosis, and/or known to be negative for the condition
being tested, to determine false positive rate
·
sources / substances that might
interfere:
·
poor sampling
·
lack of sample stabilizer
·
maternal cell contamination
·
cross contamination during sample
processing
·
normal tissue included with diseased
tissue
·
bacteria
·
endogenous substances (hemoglobin,
cholesterol, triglycerides, medications, anticoagulants, residual sample
processing, stabilization reagents)
·
exogenous substances (medications,
anticoagulants, residual sample processing, stabilization reagents)
·
Precision
·
getting the same results with
repetition of the assay
·
metrics:
·
mean
·
standard deviation
·
coefficient of variation (SD/mean)
·
intra-assay variation (repeatability):
·
various sample concentrations (suggest
at least 3 []s)
·
low [] can be 2-4 x established LoD
·
high [] should be close to 99th
percentile of tested []s
·
various patterns of variants
·
inter-assay variation
(reproducibility):
·
different days
·
different operators
·
different equipment
·
within run, calculate mean, SD, and CV
(coefficient of variation):
·
run one sample several times in one
run
·
between runs, calculate mean, SD, and
CV (coefficient of variation):
·
run one or more samples in several
different runs over several days
·
compare %CV to manufacturer's claim
for FDA-approved assays
·
evaluate the cause if higher %CV is
found
·
take control samples that cover
different genotypes
·
ex. suggest at least one homozygous
mutant, one heterozygous mutant, and one homozygous WT
·
run in triplicate per run for 5 days
·
include representation of are alleles
·
test teh
ability to discriminate similar and adjacent mutations where possible
·
sources of variability:
·
operator (most common)
·
reagent lot (2nd most common)
·
instrument
·
sample concentration
·
sample source
·
run
·
time of day
·
laboratory environment
·
apply to entire assay as applicable, from
extraction and amplification to detection
·
For Quantitative Assays:
·
At least 3 sample concentrations that
cover the clinically important decision levels
·
Low concentration replicates can be
two to four times the established level of detection (LoD)
·
High concentrations should be close to
the 99th percentile of tested concentrations / titers.
·
Example: A single operator testing
five replicates at three concentrations (low, medium, high) run for 3 days
using a single lot of reagent measured on the same instrument (total of 15
replicates per concentration will be generated) (CLSI, EP15-A2). It may be
necessary to conduct multiple repeatability validation studies to cover all
testing variables (see above).
·
Use a spreadsheet to calculate mean
value, standard deviation and %CV for within-run and between-run precision, and
percent of agreement between tests performed under two different conditions
(confidence interval should be calculated for the observed percent agreement).
For FDA approved assays, compare to the manufacturer’s claim; if higher %CV
were obtained, then evaluate the cause
·
Compare precision to clinically
acceptable variation (e.g. for HCV quantitative assays, changes beyond 0.5 log
are considered true changes and not intrinsic test variation)
·
For Genotyping Assays:
·
For assays that utilize melting temperature
(Tm): select control samples that cover the different genotypes (e.g. to
identify prothrombin 20210G>A mutation, run at least one homozygous mutant,
one heterozygous and one homozygous wild type sample, respectively). Run
replicates on each run over. Calculate the average Tm and SD of the Tm. The Tm
values have to be within the range of the manufacturer’s claim.
·
For assays that require peak size to
determine number of repeats: e.g., for Fragile X by PCR, take 3 control samples
of known size within the normal range (≤ 49 repeats), 3 “low”
pre-mutations (50-79 repeats), and 3 “high” pre-mutations (80-200 repeats).
Calculate the average size and SD. Acceptable range for Fragile X PCR +/- 3
repeats21.
·
Include representation of rare alleles
to ensure that their presence can be detected with precision.
·
Test the ability of the system/assay
to discriminate similar and adjacent mutations, where possible e.g.: BRAF V600E
(c.1799T>A) and V600K(c.1798_1799GT>AA)
·
Assay linearity (quantitative assays)
·
series of standards, or serial
dilutions of a known standard or sample
·
graph known values (x) vs. measured
values (y)
·
calculate regression (ideally slope
1.0, intercept 0)
·
Reference and Testing Material
·
reference materials (RM) are used for:
·
calibration of the measuring system
·
assessment of a measurement procedure
·
assigning values to materials
·
quality control
·
RM selected based on the needs of the
assay, the methodology, and availability
·
types of RM
·
genomic DNA
·
mimics patient sample in terms of
complexity
·
recombinant plasmids or synthesized
oligonucleotides
·
can control for multiple alleles in a
single reaction
·
examples of RM:
·
human DNA
·
bacterial and viral geomic DNA
·
mitochondrial DNA
·
synthetic DNA
·
plasmids containing human DNA
·
amplicons
·
in vitro transcripts
·
synthetic oligonucleotides
·
recombinant DNA
·
phage and phage protein packaged
nucleic acid
·
genetically modified cell lines
·
Resources for such reference materials
are listed in this document
·
AMP maintains a collection of links to
laboratory testing reference and validation materials and providers at
http://www.amp.org/committees/clinical_practice/ValidationResources.cfm
·
commercially available
·
another laboratory
·
for sequencing assays, the number of
possible mutations essentially precludes the use of reference materials
·
reference materials for quantitative
methods:
·
high, middle, and low quantitative
results
·
samples may be mixed to create
mid-point samples
·
samples may be diluted to ensure a
high result is within analytical reference range
·
aliqots can be stored at -20C for DNA or -70C for RNA
·
calibration verification q6 mo.
·
recovery study (spike samples with
known amount of standard and measure)
·
avoid plasma samples that were frozen
and thawed > 3 times
·
Clinical Validation:
·
Clinical validity:
·
ability of the test to detect or
predict the associated disorder (phenotype)
·
http://www.cdc.gov/genomics/
·
Clinical Utility:
·
how useful the test is in the
diagnosis or treatment of patients
·
Clinical Sensitivity:
·
proprtion of patients with the mutation/disease who have a positive test
result
·
Clinical Specificity:
·
proportion of patients who lack the
mutation/disease who have a negative test result
·
Assay optimization for LDTs:
·
extraction optimization
·
amplification optimization
·
detection optimization
·
interpretation and reporting
optimization
·
evaluate these variables
·
"raw material" (extracted
DNA or RNA)
·
specimen source, storage, transport,
stability, integrity
·
sufficient input amounts and volumes
of specimens to determine the correct dynamic range for each specimen type
·
quality of DNA/RNA
·
length, molecular weight, purity -
optical density at 260nm/280nm, 260/230, gel electrophoresis, bioanalyzer
·
quantity (concentration) of DNA/RNA
·
matrix - related to specimen type
·
reagents, including stores and
aliquots/working solutions
·
instrument calibration
·
well-to-well cross-contamination (for
automated nucleic acid extraction)
·
Reportable range:
·
Definition: The span of test result
values over which the laboratory can establish or verify the accuracy of the
instrument or test system measurement response
·
Linear Range:
·
Definition: The range where the test
values are proportional to the concentration of the analyte
in the sample
·
Reference range:
·
reference intervals (normal range):
·
normal controls (patients without the
condition being tested)
·
if clinically indicated, consider range of individuals of both
sexes, ethnicity, pregnancy status, etc.
·
consider individuals with other
medical conditions who will be tested
·
number of samples is at the discretion
of the laboratory director
·
abnormal range:
·
include heterozygous and homozygous,
if possible
·
at least 20 samples if possible,
preferably more
·
Other validation references to check
out:
·
Jennings L, Van Deerlin
VM, Gulley ML. Recommended principles and practices for validating clinical molecular
pathology tests. Arch Pathol Lab Med.
2009;133:743-755
·
Marlowe, E.M. and Wolk,
D.M. 2009. Molecular Method Verification. Diagnostic Molecular Microbiology,
ASM Press, Washington, DC., in press
·
AMP EGFR testing guidelines 2013
·
Validation should be performed for each
specimen type likely to be encountered, and testing should be reported only on
validated specimen types.
·
tumour enrichment proceducres
should also be assessed during test validation
·
specificity of ultrasensitive methods
(< 1% allele fraction) must receive additional attention
·
multiple negative lung cancer specimens
·
multiple no-template controls
·
References:
·
Molecular Diagnostic Assay
Validation. Association for Molecular
Pathology Clinical Practice Committee (Oct. 2009)
·
Lindeman et al. Molecular Testing Guideline for Selection of
Lung Cancer Patients for EGFR and ALK Tyrosine Kinase Inhibitors: Guideline
from the College of American Pathologists, International Association for the
Study of Lung Cancer, and Association of Molecular Pathology. Arch Pathol Lab Med
2013;137:828-860.
·