Natural variation in the E and J genes are responsible for the adaptability of soybean to cultivation at various latitudes, including the tropics, with several polymorphic sites significantly associated with flowering time (Jiang et al., 2019; Lu et al., 2017). Thus, these genes may represent strong candidates for elucidating the natural variation that exists for photoperiod sensitivity in Cannabis.
However, though candidate gene searches can be informative, it is worth considering that gene functions may not be similarly conserved across species. As such gene mapping approaches (genome-wide association studies, quantitative trait loci mapping) and functional analyses are still required to elucidate the flowering time network in Cannabis. Furthermore, as a wind-pollinated, dioecious species Cannabis is primarily outcrossing (Salentijn et al., 2015). In self-pollinating species such as rice, Sorghum and Arabidopsis, variation in flowering time is controlled by a few large-effect loci (Gage et al., 2020). However, in the model outcrossing species maize, several loci all contribute a small amount to phenotypes of complex traits such as flowering time (Buckler et al., 2009). It remains to be seen how complex traits will be determined in Cannabis, but taking inspiration from the successful elucidation of complex traits in maize may motivate the formation of sophisticated multi-parental mapping populations such as Nested Association Mapping (NAM) or Multiparent Advanced Generation InterCross (MAGIC) populations (Gage et al., 2020).

7. The long read: Cannabis genetics and genomics

While the medicinal, agricultural, and industrial applications of Cannabis are vast, Cannabis genomics has lagged in relation to that of other crops. In recent years, however, as legal restrictions have eased and with the advent of third-generation sequencing, the field has picked up significantly. With this, there is now a wealth of novel data readily available for analysis. Here, the currently available genomics and transcriptomics data will be reviewed.
The Cannabis genome is diploid (2n=20), consisting of 9 autosomes and a pair of heteromorphic sex chromosomes (X and Y). The haploid genome size is predicted to be 843 Mb and 818 Mb for male (XY) and female (XX) plants respectively, with the larger Y chromosome accounting for the sex-specific difference in genome size (see also discussion above on sex determination) (Sakamoto et al., 1998). The Cannabis genome is not large relative to that of other crops like maize and wheat. However, it has been difficult to resolve due to its high heterozygosity and the abundance of repetitive DNA sequences. High heterozygosity levels have been retained in the genome, as Cannabis is dioecious and has not been subject to intense breeding (Lynch et al., 2016; Sawler et al., 2015). While this genetic diversity is desirable for selective breeding, it can complicate genome assembly. Highly variable alleles are often misassembled as segmental duplications, whereby both haplotypes are incorporated at distinct loci, inflating the genome assembly size (Claros et al., 2012; Michael and VanBuren, 2020). Additionally, it is estimated that repetitive sequences constitute ~70% of the Cannabis genome (Gao et al., 2020; Laverty et al., 2019; Pisupati et al., 2018). Misassembly of multiple repeat elements onto one contig causes assembly collapse, reducing the genome assembly size (Claros et al., 2012; Michael and VanBuren, 2020). These features are particularly challenging when short-read sequencing is applied. While the first draft Cannabis genome, from the marijuana Purple Kush (PK) cultivar, was sequenced in 2011, the short-read sequencing of that time could not resolve repeat-rich, low complexity regions (van Bakel et al., 2011). This led to a very valuable yet incomplete genome assembly, consisting of 534 Mb (van Bakel et al., 2011).
Third generation single molecule (or long-read) sequencing, such as Pacific Biosciences (PacBio) and Oxford Nanopore sequencing, can generate long reads which are capable of capturing the regions flanking repeat sequences and segmental duplications. Thus, long-read sequencing greatly facilitates the assembly process and has been revolutionary for plant genomics, enabling chromosome-level assemblies to be achieved (Jiao and Schneeberger, 2017; Michael and VanBuren, 2020). Recently, long-read sequencing, in parallel with genetic and physical mapping, has enabled four chromosome-level assemblies from the CBDRx, PK, Finola (FN), and a wild Cannabis (CR) line (Table 1) (Gao et al., 2020; Grassa et al., 2018; Laverty et al., 2019). The CBDRx genome (a female individual) was sequenced using Oxford Nanopore technology and has an assembly size of 876.148 Mb (Figure 5a) (Grassa et al., 2018). In 2019, the first genome-wide annotation was made available for this Cannabis genome, making it the reference genome on the NCBI database (Jenkins and Orsburn, 2019). The PK and FN genomes were sequenced with PacBio single-molecule sequencing (Laverty et al., 2019). The PK (female) and FN (male) assembly sizes are 891.965 Mb and 1009.67 Mb respectively, both of which are significant improvements upon the original draft PK genome from 2011 (Table 1) (van Bakel et al., 2011; Laverty et al., 2019). The CR variety, which is derived from a wild Cannabis plant, was also sequenced with PacBio, achieving a genome assembly size of 812.525 Mb (Gao et al., 2020). While linkage maps were generated to resolve the chromosomes for the CBDRx, PK and FN genomes (Grassa et al., 2018; Laverty et al., 2019), Hi-C data was used to create a physical map for CR, enabling the chromosomes to be assembled (Supplementary Table 1) (Gao et al., 2020).