Section 04 – DNA Technology: Difference between revisions

From ICAR Wiki
Line 132: Line 132:
== ICAR Services Related to DNA Technology ==
== ICAR Services Related to DNA Technology ==
ICAR offers three services that are related to the use of DNA all of which are linked to parentage analysis in one form or another, as shown in Figure 1. ICAR DNA services., and describing them in more detail in the other sections.
ICAR offers three services that are related to the use of DNA all of which are linked to parentage analysis in one form or another, as shown in Figure 1. ICAR DNA services., and describing them in more detail in the other sections.
[[File:ICAR DNA service.png|thumb|Figure 1. ICAR DNA  services.|center]]
[[File:ICAR DNA service.png|thumb|Figure 1. ICAR DNA  services.|center|415x415px]]
 
== ICAR Accreditation of Laboratories Providing DNA Genotyping Services ==


== Sub-sections ==
== Sub-sections ==

Revision as of 09:32, 19 July 2024

Introduction

Advances in molecular biology, especially genomics, provide a new set of information to be incorporated into the animal industry. On one hand, the use of molecular information may contribute to the enhancement of consumers' trust in the ability to monitor and control the animal production chain. On the other hand, molecular information will greatly contribute to the achievement of genetic improvement for animal traits through the use of genomic breeding values, marker assisted selection, gene introgression, heterosis prediction, pedigree validation/prediction, and genetic defect carrier status. In most cases, advantages of using molecular information via genomic evaluations, comes from improved accuracy of animal breeding values, shortened generation intervals, and increased intensity of selection. Even with these advancements there is still a need for research and development in the search for associations between genetic markers and traits of interest, especially as new traits are included in national evaluation indexes. In addition to that, even with the current incorporation of genomic information into national selection schemes, an understanding of gene action, gene interactions, and differential gene expression to avoid negative collateral effects is needed. Cooperation between animal industries and research is required for a successful and beneficial search for genetic information in commercial livestock populations.

Molecular Genetics

Genetic Markers

Genetic markers are the fundamental molecular tools for genomics, even as the type of marker used has changed. The first genetic marker associations in livestock were reported using blood typing in the 1960s, the technology then moved to microsatellites (MS) in the 1990s and more recently to the use of Single Nucleotide Polymorphism (SNP). SNP and MS are polymorphic DNA sequences (alleles) at a specific locus of a particular chromosome.  While blood typing has been an ICAR approved method of parentage verification currently there are few, if any, commercial labs still offering this testing.  For this reason, ICAR no longer recommends blood typing as the basis for carrying out parentage analysis in livestock species where MS or SNP technology is widely available.

Microsatellites

These are segments of DNA containing tandem repeats of simple motifs usually dimers or trimers. These segments are located throughout the genome and normally in non-coding regions. Over time, these regions are subject to the addition or subtraction of tandem repeats, which means that each microsatellite can have multiple unique alleles. Microsatellites are commonly used in many livestock species for parentage validation.

Single Nucleotide Polymorphism (SNP)

SNP are the most common type of genetic variation: each SNP represents a variation in a single nucleotide. There are millions of SNP located throughout the genome of every livestock species. For genomics the most informative SNP traditionally are either located in (a) coding regions where different alleles change the structure or function of the encoded protein, or (b) at non-coding regions that are involved in the regulatory function of the gene.   For genomic breeding values, SNP that are located in other regions of the genome are also informative as they could be in linkage disequilibrium with alleles that do cause a phenotype change. 

One of the big advantages of SNP is their deployment on SNP arrays with a strong parallel processing capacity whereby thousands or hundreds of thousands of SNP can be screened together in a cost-effective and efficient manner across a large number of animals.  Currently, the largest livestock genotyping labs can process hundreds of thousands of animals yearly on such arrays.  The availability of these large SNP panels is therefore bolstering the search for mutations underlying genetic variation for simple and complex traits. It is also revolutionizing the speed at which trait associated genes or gene regions are being discovered as well as the adoption rate of genomic selection strategies. SNP genotypes have become the international standard for the basis of parentage analysis and ICAR recommends this approach over the use of microsatellites wherever possible due to the improved accuracy and the ease of comparing results between genotyping laboratories.

Current and Potential Uses of DNA Technologies

Parentage verification and parental assignment authentication

Prior to the emergence of SNP genotyping, parentage verification was the main commercial use of genetic markers. Traditionally, parentage testing was based on the exclusion of relationship (i.e.: sire or dam) when an animal has a genotype inconsistent to a putative relationship. New trends in animal production systems are tending to encourage animal production in larger numbers per farm in response to environmental and production related constraints. In these large settings, multiple animals could be bred or give birth on the same day, which can result in more pedigree recording errors. As the cost of the analysis decreases and the number of genetic markers available increases, breed societies are now able to build up pedigree records using genetic markers to predict the pedigree of calves born in a herd at a given time. This normally requires a prior knowledge of candidate sires and dams for a calf when lower number (<200) of markers are used, but with enough SNP the correct parents can be predicted without prior knowledge being available as long as the parent is also genotyped. The probability of assignment to a correct pair of animals will depend on the number of markers used, number of alleles per loci, the minor allele frequency in the population, the number of parents, and the number of possible matings. The International Society of Animal Genetics (www.isag.us) has species-specific panels recommended of microsatellite and SNP markers for this purpose, which can be accessed via a link such as provided in Appendix 1. Link to SNP markers recommended by ISAG for parentage verification.  For cattle, ICAR has developed a set of parentage SNP, ICAR554, which incorporates the ISAG recommended panel and other highly informative SNP.  This panel allows for highly accurate parentage validation and discovery while not allowing for accurate imputation to a higher density.  Therefore, the ICAR554 panel can be shared among countries and competitors for parentage analysis without fear of others being able to use them to predict genomic breeding values. ICAR and the Interbull Centre collaborate in offering an international genotype exchange service, referred to as GenoEx, which is described further in Chapter 5 specifically for the exchange of SNP genotypes for the purposes of parentage analysis.

Traceability and authentication of animal products offered to consumers

Due to multiple crises, including BSE outbreaks to ground beef containing horsemeat, and with increased consumer interest in where their food comes from the traceability of meat products is of greater concern to the industry. Traceability is based on the availability of a verification and control system that monitors all relevant details throughout the entire livestock production chain. Since an individual’s genetic sequence is unique and does not change, its DNA remains constant from ‘conception to consumption’. Therefore, use of genetic markers allows one to match the DNA of an individual at birth to the final product.

Genetic markers for the authentication of animal products for labels of quality related to geographic location and labels of quality related to specific breeds or their crosses are/or will be very useful. However, this requires the establishment of molecular standards or allele frequencies for each breed within a species. A lot of information is coming from studies of genetic diversity among breeds. Genomic regions subject to intense selection in each population are of particular interest.  With a large enough set of SNP and genotyped purebred reference animals it is also possible to predict the most likely breed composition of individuals. 

Molecular genetic information for marker-assisted selection schemes

Quantitative traits are generally assumed to be controlled by a large number of genes. However, individual genes sometimes account for a significant amount of variation of the trait. Such is the case for the Myostatin gene and double muscling in beef cattle, the DGAT1 gene and milk components in dairy cattle, or the Booroola fecundity gene and ovulation rate in sheep. Since the genotype of an animal does not change during its lifetime, use of DNA information through the identification of markers linked to QTL with effects on production traits or the identification of a gene itself together with the causative variant is of great interest. Nevertheless, with complex traits there is a growing need of having a sufficiently large marker set to incorporate molecular information for selection decisions. Including genomic information as a selection criterion is of special interest for traits that are difficult and costly to measure and/or are measured late in life. By 2022, >177,000 cattle, >34,000 swine, >16,000 chicken and >4,000 sheep QTL have been identified that are associated with economically important traits such as health, carcass, milk, fertility, and body conformation.  The AnimalQTLdb database housed at the National Animal Genome Research Program contains up to date information on cattle, chicken, horse, pig, trout, and sheep QTL data assembled from published data.

Recording schemes have been collecting information for decades on the most common production traits measured in domestic livestock. There is an ever-increasing volume of information becoming available, but for some traits like meat quality, disease resistance and feed efficiency, those records are very expensive to measure, difficult to obtain, or are performed late in the animal’s life. Because of these challenges information for such traits is commonly collected on a reduced number of animals in any given population.

For these challenging, but economically important traits, genetic markers and genomic selection offer significant opportunities for trait selection where it was not economically feasible before.  In general, genetic markers and genomics will play an important role for important traits regardless of the livestock species. Genomics can also allow us to increase selection intensities since we can predict genomic breeding values on a large number of animals and thus have more candidates for selection.

Disease resistance and genetic defects

Another group of traits with a high potential for the use of molecular data and genomics are those linked to resistance, resilience, and susceptibility to diseases. There are a number of multi-factorial or complex diseases that are the result of the interaction between an animal’s genome and environmental components. Disease resistance traits are among the most difficult to include in genetic improvement programs because they require good field measurement of the disease status of the animals and a systematic control of management or environmental conditions that allow for the identification of the environmental influence on the health status of the animal. Infectious diseases depend very much upon environmental factors such as the degree of exposure to the pathogen agent. Thus, if exposure is low, animals will show little variation. Part of the phenotypic differences for resistance may be differences in the degree of challenge. Therefore, if genes or genetic markers linked to resistance are correctly identified, resistant animals will be able to be selected on the base of their genomic information. For many diseases, identification of genes associated with resistance will require experimental conditions to be used. Genetic analysis to identify heterozygous carriers of genetic diseases caused by single, recessive genes are currently in use. Examples in dairy cattle include complex vertebral malformation (CVM), brachyspina (BY), cholesterol deficiency (CD) and several genes, gene regions or haplotypes causing embryo loss or stillbirth in different dairy breeds. In 2022, OMIA (Online Mendelian Inheritance in Animals), listed >1000 traits or genetic defects in livestock with a known causative mutation (cattle: 186, pig: 58, chicken: 56, sheep: 49, horse: 48, goat:17).  Including these causative allele or associated haplotypes in a breeding program will allow producers to minimize their risk from genetic defects while maximizing genetic progress from beneficial traits.

Technical Aspects

DNA collection

Systematic collection of DNA is recommended in several livestock populations. DNA may be obtained from any nuclear cell in the body. Protocols for DNA extraction are now available for blood (white cells), semen, saliva (epithelial cells), hair follicles, muscle, skin, organs (such as liver, spleen etc.). Red blood cells may also be used for poultry as they retain the nuclear body while most other species do not.  Small amounts of tissue material are required for routine DNA analysis. However, if there are multiple future uses of an individual’s DNA (whole genome sequencing, traceability, causative allele validations, …), then DNA storage costs, extraction costs, quality, and quantity obtained by different protocols will have to be carefully examined and optimized. Common collection methods include hair follicles, tissue samples (often ear punch) in an enclosed container, blood spots on filter paper, and nasal swabs.

Data organization

A centralised database may be organised in respect to the main uses of the genetic information:

  • Parent verification, assignment, and/or discovery
  • Traceability of meat products
  • Breed identification or breed diversity
  • Qualitative and quantitative traits

Database tables may contain:

  • Animal identification to link to all other information on the animal and its relatives.
  • Number of genetic markers: n
  • Standard name of each marker i (for i= 1, n)
  • Accession number for marker such as the dbSNP ID
  • Alleles for marker i
  • Genomic location of marker i
  • Effect of non-reference allele on the protein
  • Phenotypic effect of the allele
  • Association with other traits

Parentage accuracy

While use of microsatellite and SNP markers are both ICAR accredited methods of parentage verification they do not have the same power of parentage accuracy. Briefly the order of accuracy for ISAG and ICAR approved parentage marker panels are:

Microsatellites << small SNP panels (100 or less) < large SNP panels (500 or more)

This order is based on both genotyping accuracy and total genomic information.  Comparing the genomic marker error rate in cattle microsatellites have a 1-5% error rate (Baruch and Weller, 2008[1]) while the  SNP error rate is <0.1% (Cooper, Wiggans, and VanRaden 2013[2]).  As 2-3 SNP provide the same parentage exclusion accuracy as 1 microsatellite marker (Vignal et al. 2002[3]), the 100 and 200 ISAG parentage SNP panels are more accurate than the 12 ISAG parentage microsatellite markers.  In the same manner parentage panels of over 500 SNP (McClure et al. 2018[4]), such as the ICAR554, are recommended for parentage prediction which requires an even higher level of accuracy.

Genomic quality control checks

One of the most important parts of a large genomic database is to ensure that a genotype associated with an individual animal truly belongs to that animal.  Most large livestock genomic databases deal with SNP data only and the quality of SNP genotyping data is of paramount importance (Wu at al., Evaluation of genotyping concordance for commercial bovine SNP arrays using quality-assurance samples, Animal Genetics, 50: 367-371, 2019). This section, therefore, focuses on quality control for that genomic data type.  Both sample and SNP quality control measures are needed, and it is encouraged to develop a system for them early. 

For those working with genotype data there are two main concerns.  First, is ensuring that the genotype data itself is of high quality and can be trusted. Second, is ensuring that the genotype truly belongs to the individual listed.  The recommended quality control checks below will work for any livestock species.  The basic checks can be performed with minimal information about the individual, while some of the advanced checks require data that not everyone will have, such as historic animal location.

Basic genotype quality control checks for SNP-based genotype data

Genotype: 

  1. Exclude SNP that have a genotype call rate below 90% when analyzed in your population.  Using 500 or more animals to determine the SNP call rate is recommended.  Chromosome Y SNP should have their call rate determined only in males for this filter.
  2. Invalidate the individual’s genotype if its overall call rate is below <90%.  For SNP-based genotypes, such as those from Illumina or Affymetrix chips, the accuracy of called genotypes is questionable when the individual’s overall call rate is <90%.
  3. Check to see that the animal has all three genotype classes (i.e.: AA, AB and BB) in its full genotype file. If any genotype class is missing or has a frequency below 20% then invalidate the full genotype.
  4. Check to ensure that there are no unexpected alleles in the genotype file. For example, genotypes in AB format should not have T, G, 0, 1, 2 or 9.   If present, then invalidate the full genotype file.

Parentage:

  1. Parent (Sire or Dam) validation.   If using 200 or less SNP, a listed sire will validate if <1% of the offspring-parent genotypes are in conflict.  A conflicting genotype is where the offspring and listed parent have opposite homozygous genotypes.
  2. Mating validation after parent validation.   For all parentage validation SNP where the animal is heterozygous, if for >1% of those SNP the sire and dam are homozygous for the same allele then the listed mating is invalidated.  This could represent a case where the offspring and one of the parents were mislabeled with the other’s identification (so the offspring’s genotype belongs to the sire or dam and vice versa). Under such cases, it is recommended to resample the DNA and regenotype, potentially with a panel that includes more SNP.
Advanced quality control checks for SNP-based genotype data

Animal:

  1. Parentage discovery.   Using SNP data to predict who an animal’s likely parent is can be very useful, but steps must be taken to ensure a very high probability that the prediction is accurate. The following are recommended:
    1. Using 500 or more SNP that have a minor allele frequency (MAF) above 20% and call rates above 90%. It is advised to calculate the MAF across your full population. Predicted parents should have <1% conflict rate with the animal.
    2. Sex check. Make sure that you have a process established to ensure that only males are predicted as the sire and only females as the dam.
    3. Date of Birth check. If you do not include a check that the predicted parent is older than the animal than the predicted individual could actually be an offspring of the animal.
    4. Age gap. Cattle normally reach sexual maturity at 11-12 months of age, but this can be as young as 8-9 months, and even younger if in-vitro fertilization is a technology used within the population. Under normal circumstances, a minimum of 17 months between the birth dates of the animal and its predicted parent is recommended to ensure that the predicted parent could have been sexually mature at the time of the breeding.
    5. Grey zone SNP conflicts. The majority of animals will have <0.5% or >1.5% conflicting genotypes with the individual when parentage discovery is conducted. Those with <0.5% pass the prediction and those with >1.5% fail. Most failed animals will have >8% conflict rates. For those animals who have between 0.5 and 1.5% conflicting SNP when a set parentage panel is used (i.e.: the ICAR554 SNP list), it is advised that the conflict rate from all available SNP be used between the two individuals and if the percent conflicting is <1% they validate as the parent, but if >1% they fail.
  2. Genetic Relationship Matrix (GRM). If an animal’s true parent is not genotyped, then it cannot be directly predicted or validated. The genetic relationship between closely related animals can be used to suggest a potential, non-genotyped, parent. It is recommended that 7,000 or more SNP be used to calculate the GRM. GRM results DO NOT validate a relationship, but only suggest. Caution should be used as GRM values can be inflated for inbred individuals. Full-sibs and parent-child should have GRM values around 50%, while half-sibs would be around 25%. The range for each group can vary 5-10% from the expected value.
  3. Sex prediction. How to perform a sex prediction depends on the type and number of SNP an animal has from the X and Y chromosomes. While not every commercial chip includes chromosome Y SNP, they typically contain chromosome X markers.
    1. Pseudo autosomal region (PAR) SNP. As both the X and Y chromosomes contain the PAR, SNP from this region should be excluded from sex prediction. If the PAR position boundaries are not published for your species they can be roughly determined by analyzing the chromosome X SNP in known males and females and identifying the region where the MAF in males for a continuous set of SNP is >1%. Non-PAR regions of chromosome X will have SNP with average MAF of <1% in males and >>1% in females.
    2. Chromosome X predicted. Use non-PAR SNP to determine the animal’s chromosome X heterozygosity rate (number of heterozygous chromosome X SNP / total number of chromosome X SNP). If the average heterozygosity rate is <5%, the predicted sex is male, and if >15% its female. If the rate is between 5 and 15% then the predicted sex is unknown.
    3. Chromosome Y predicted. Using chromosome Y SNP to predict sex is logically simpler but many commercial chips do not contain them. Say you have 7 chromosome Y SNP with high call rates in males, it is recommended using the following logic. Male is predicted when 6-7 of the Y SNP are present; female is predicted when <1 SNP is present and ambiguous sex is predicted when 2-5 Y SNP are present.
    4. Ambiguous sex prediction. If one set of sex chromosome SNP returns an ambiguous sex prediction and the other doesn’t it is recommended using the latter as the predicted sex. If both SNP sets are ambiguous, the animal could have Turner syndrome (X0), or Klinefelter’s syndrome (XXY), in this case it is recommended returning an ambiguous predicted sex.
    5. If the predicted sex from the chromosome X and Y analysis disagree, it is recommended returning an ambiguous predicted sex. This could also indicate a possible Klinefelter syndrome (XXY) animal.
    6. Sex selected AI semen straws. Sex prediction should not be carried out on DNA obtained from sex selected AI semen straws.
  4. Offspring Quality control. The genotyped and listed offspring of an animal can be used to identify potential cases where the animal’s genotype actually belongs to another animal. These should be used as flags to indicate a potential investigation, but it is recommended to temporarily invalidate the animal’s genotype until cleared. Advised thresholds for those flags are:
    1. AI sire: If >80% of genotyped offspring fail if >10 offspring are genotyped.
    2. Stock/herd bull: If >80% of genotyped offspring fail if >5 offspring are genotyped.
    3. Dam: If 100% of genotyped offspring fail if 2 offspring are genotyped, else if >5 offspring are genotyped then use >80%.
  5. Duplicate genotype. The only case where two or more animals should share the exact same genotype is if they are identical twins or clones. Checking to see if >1 animal has the same genotypes is a useful quality control check. It is recommended using your parentage SNP set for initial screening and for any pair that have >99% identical genotypes and then using all available SNP to see if >99% of the genotypes match.

For standardization purposes with respect to the nomenclature of genes or loci, a web site is available at: https://www.genenames.org/about/guidelines#genenames and markers at: http://www.HGVS.org/varnomen.

ICAR Services Related to DNA Technology

ICAR offers three services that are related to the use of DNA all of which are linked to parentage analysis in one form or another, as shown in Figure 1. ICAR DNA services., and describing them in more detail in the other sections.

Figure 1. ICAR DNA services.

ICAR Accreditation of Laboratories Providing DNA Genotyping Services

Sub-sections

  1. Baruch, E., and J. I. Weller. 2008. 'Estimation of the number of SNP genetic markers required for parentage verification', Animal Genetics, 39: 474-79.
  2. Cooper, T. A., G. R. Wiggans, and P. M. VanRaden. 2013. 'Short communication: relationship of call rate and accuracy of single nucleotide polymorphism genotypes in dairy cattle', Journal of dairy science, 96: 3336-9.
  3. Vignal, A., D. Milan, M. SanCristobal, and A. Eggen. 2002. 'A review on SNP and other types of molecular markers and their use in animal genetics', Genet Sel Evol, 34: 275-305. 1.4.4  Genomic quality control checks
  4. McClure, M. C., J. McCarthy, P. Flynn, J. C. McClure, E. Dair, D. K. O'Connell, and J. F. Kearney. 2018. 'SNP Data Quality Control in a National Beef and Dairy Cattle System and Highly Accurate SNP Based Parentage Verification and Identification', Front Genet, 9: 84.