Natural variation in the E and J genes are responsible for
the adaptability of soybean to cultivation at various latitudes,
including the tropics, with several polymorphic sites significantly
associated with flowering time
(Jiang et al., 2019;
Lu et al., 2017). Thus, these genes may represent strong candidates for
elucidating the natural variation that exists for photoperiod
sensitivity in Cannabis.
However, though candidate gene searches can be informative, it is worth
considering that gene functions may not be similarly conserved across
species. As such gene mapping approaches (genome-wide association
studies, quantitative trait loci mapping) and functional analyses are
still required to elucidate the flowering time network in Cannabis. Furthermore, as a wind-pollinated, dioecious species Cannabis is primarily outcrossing
(Salentijn et al.,
2015). In self-pollinating species such as rice, Sorghum and Arabidopsis, variation in flowering time is controlled by a few
large-effect loci
(Gage et al., 2020).
However, in the model outcrossing species maize, several loci all
contribute a small amount to phenotypes of complex traits such as
flowering time
(Buckler et al.,
2009). It remains to be seen how complex traits will be determined in Cannabis, but taking inspiration from the successful elucidation
of complex traits in maize may motivate the formation of sophisticated
multi-parental mapping populations such as Nested Association Mapping
(NAM) or Multiparent Advanced Generation InterCross (MAGIC) populations
(Gage et al., 2020).
7. The long read: Cannabis genetics and genomics
While the medicinal, agricultural, and industrial applications of Cannabis are vast, Cannabis genomics has lagged in
relation to that of other crops. In recent years, however, as legal
restrictions have eased and with the advent of third-generation
sequencing, the field has picked up significantly. With this, there is
now a wealth of novel data readily available for analysis. Here, the
currently available genomics and transcriptomics data will be reviewed.
The Cannabis genome is diploid (2n=20), consisting of 9 autosomes
and a pair of heteromorphic sex chromosomes (X and Y). The haploid
genome size is predicted to be 843 Mb and 818 Mb for male (XY) and
female (XX) plants respectively, with the larger Y chromosome accounting
for the sex-specific difference in genome size (see also discussion
above on sex determination)
(Sakamoto et al.,
1998). The Cannabis genome is not large relative to that of
other crops like maize and wheat. However, it has been difficult to
resolve due to its high heterozygosity and the abundance of repetitive
DNA sequences. High heterozygosity levels have been retained in the
genome, as Cannabis is dioecious and has not been subject to
intense breeding
(Lynch et al., 2016;
Sawler et al., 2015). While this genetic diversity is desirable for
selective breeding, it can complicate genome assembly. Highly variable
alleles are often misassembled as segmental duplications, whereby both
haplotypes are incorporated at distinct loci, inflating the genome
assembly size (Claros
et al., 2012; Michael and VanBuren, 2020). Additionally, it is
estimated that repetitive sequences constitute ~70% of
the Cannabis genome
(Gao et al., 2020;
Laverty et al., 2019; Pisupati et al., 2018). Misassembly of multiple
repeat elements onto one contig causes assembly collapse, reducing the
genome assembly size
(Claros et al., 2012;
Michael and VanBuren, 2020). These features are particularly
challenging when short-read sequencing is applied. While the first draft Cannabis genome, from the marijuana Purple Kush (PK) cultivar,
was sequenced in 2011, the short-read sequencing of that time could not
resolve repeat-rich, low complexity regions
(van Bakel et al.,
2011). This led to a very valuable yet incomplete genome assembly,
consisting of 534 Mb
(van Bakel et al.,
2011).
Third generation single molecule (or long-read) sequencing, such as
Pacific Biosciences (PacBio) and Oxford Nanopore sequencing, can
generate long reads which are capable of capturing the regions flanking
repeat sequences and segmental duplications. Thus, long-read sequencing
greatly facilitates the assembly process and has been revolutionary for
plant genomics, enabling chromosome-level assemblies to be achieved
(Jiao and
Schneeberger, 2017; Michael and VanBuren, 2020). Recently, long-read
sequencing, in parallel with genetic and physical mapping, has enabled
four chromosome-level assemblies from the CBDRx, PK, Finola (FN), and a
wild Cannabis (CR) line (Table 1)
(Gao et al., 2020;
Grassa et al., 2018; Laverty et al., 2019). The CBDRx genome (a female
individual) was sequenced using Oxford Nanopore technology and has an
assembly size of 876.148 Mb (Figure 5a)
(Grassa et al.,
2018). In 2019, the first genome-wide annotation was made available for
this Cannabis genome, making it the reference genome on the NCBI
database (Jenkins and
Orsburn, 2019). The PK and FN genomes were sequenced with PacBio single-molecule sequencing
(Laverty et al.,
2019). The PK (female) and FN (male) assembly sizes are 891.965 Mb and
1009.67 Mb respectively, both of which are significant improvements upon
the original draft PK genome from 2011 (Table 1)
(van Bakel et al.,
2011; Laverty et al., 2019). The CR variety, which is derived from a
wild Cannabis plant, was also sequenced with PacBio, achieving a
genome assembly size of 812.525 Mb
(Gao et al., 2020).
While linkage maps were generated to resolve the chromosomes for the
CBDRx, PK and FN genomes
(Grassa et al., 2018;
Laverty et al., 2019), Hi-C data was used to create a physical map for
CR, enabling the chromosomes to be assembled (Supplementary Table 1)
(Gao et al., 2020).