Tomato DB - Methods

Methods

Plant material

The majority of tomato ecotype seeds (Solanum lycopersicum L.) were collected from Melitakes (Public benefit cooperative enterprise). Most ecotypes were grown in the greenhouse of the Department of Agriculture in the Hellenic Mediterranean University. Some other ecotypes were grown in the greenhouse of the Department of Biology, University of Crete. Young leaves (not fully developed) were collected and stored at 80oC.

DNA extraction

The DNA was extracted from 100-150 mg fresh weight of leaf tissue. The extraction method was based on a CTAB (hexadecyltrimethylammonium bromide) extraction as described by Bibi et al. 2020. It was developed for grape (Vitis vinefera L.) mainly but it worked very well for tomato. The extraction buffer consisted of: 2% CTAB [hexadecyltrimethylammonium bromide], 1.4 M NaC1, 100 mM Tris, 20 mM EDTA, 0.5% BME, pH 8.0, 2% PVP (MW = 360,000)

Nuclear microsatellite PCR

Amplification primers sequences for nuclear microsatellite loci from Korir et al. 2014, ssrSLR3, ssrSLR4, ssrSLR5, ssrSLR10, ssrSLR13, ssrSLR20, ssrSLR21, ssrSLR23, ssrSLR26 Le002, and Le015 were used for DNA amplification. PCR amplifications were carried out in 96-wells propylene plates in 20μl final volume reaction mixtures in a Biorad-100 thermal cycler (Biorad 1000 Alfred Nobel Drive Hercules., California 94547, USA). The PCR was conducted in a final volume of 20 μL and the reaction mixture contained 20 ng/μL genomic DNA, 0.2 μM of the forward primer (labeled) and 0.2 mM of the reverse primer (unlabeled), 0.2 mM dNTP, 1U Taq polymerase (Biotools, Madrid, Spain), 2mM MgCl2.

The forward primer was in each case labelled with the Licor IR800 fluorochrome. The following thermal cycling protocol was applied: 95 oC for 5 min, 40 cycles of 30 s at 94 oC -30s at the annealing temperature of its primer set and 30s of 72 oC, finally 10 min at 72 oC, hold at 7 oC indefinite. The resulting PCR products were first visualized by 0.8% agarose gel electrophoresis. Up to three different primer pairs were mixed in the same well (multiplex), taking into account the size of the amplified fragments and/or the labeling of the primers prior to the SSR fractionation. The products were loaded into an Applied Biosystems SeqStudio Genetic Analyzer (Thermo Scientific, USA) for SSR fractionation. Allele binning and data matrix production were done within STRand, version 2.4.108 (Veterinary Genetics Lab, University of California). During the fragment analysis, online size standards LIZ600 of Applied Biosystems were employed.

Genetic Analysis and Neighbor‑Joining Tree Construction

Per locus, the allele sizing was based on published repeat patterns (Carvalho et al. 2021). The data matrixes were produced and genetic diversity measures were determined for each employed locus across all fingerprinted genotypes. These measures included: (i) individual locus polymorphic information content (PIC) (Botstein et al., 1980), (ii) observed heterozygosity (HO), and (iii) expected heterozygosity (HE). PIC, HO, HE, estimated frequency of null alleles and probability of identity (PI) were calculated with the software CERVUS ver. 3.0.3 software package (Kalinowski and Taper, 2007). A similarity matrix was produced employing Nei’s distance matrix within GenAlEx (version 6; (Peakall Smouse, 2006). Subsequently, a neighbor-joining tree was produced using the function aboot of the poppr package of R (v. 4.1.3) to estimate the dendrogram based on the Nei's genetic distance together with the bootstrap values on the branches of the tree. From the 83 tomato individuals, 29 tomato ecotypes were formed and used for dendrogram construction. To estimate the divergence between the different populations, pairwise Fst measurements were calculated according to Weir and Cockerham (1984) using GenAlEx 6 (Peakall Smoose, 2006). Analysis of molecular variance (AMOVA) was also performed to assess the genetic structure of the 29 tomato ecotypes, using GenAlEx 6.

Population STRUCTURE

STRUCTURE is a freely available program for population analysis developed by (Pritchard et al., 2000a). It analyses differences in the distribution of genetic variants amongst populations with a Bayesian iterative algorithm by placing samples. This program uses a systematic Bayesian clustering approach applying Markov Chain Monte Carlo (MCMC) estimation. It applies a model to the data of K assumed populations or genetic groups, each characterized by a subset of allele frequencies identified in the data. The genetic structures of these individuals analyzed using STRUCTURE 2.3.4 software. This software applies a Bayesian clustering algorithm to identify subpopulations, assign individuals to them, and estimate population allele frequencies (Pritchard et a., 2000). This analysis was carried out using a burning period of 200,000 iterations and a run length of 800,000 MCMC replications. We tested a continuous series of K, from 1 to 10, in 10 independent runs. We did not introduce any prior knowledge about the origin of the population, and assumed correlated allele frequencies and admixture (Falush et al., 2003). For selecting the optimal value of K, ΔK values (Evanno et al., 2005) were calculated using STRUCTURE harvester (Earl and Von Holdt, 2012). POPHELPER, proposed by (Francis 2016), was used to analyze and visualize population structure.

Multidimensional Scaling Analysis

Multidimensional Scaling (MDS) is a computational approach used to visualize the level of similarities (or dissimilarities) between high-dimensional individuals as a configuration of points mapped into an Cartesian space (Mead A., 1992). MDS is a distance-based method. Here, we applied the Reynolds distance (Reynolds J. et al., 1983) between the populations of the sample. Reynolds distance (or coancestry distance) provides an estimate of the genetic drift between the populations. Reynolds approach is considered appropriate for data with small mutation rates adopting the infinite alleles model. Even though the method was developed for allozymes, which are characterized by a small mutation rate compared to the microsatellite data, the method is considered appropriate for small populations, in which genetic drift considerably affects the evolution of the populations. In such scenarios (appropriate for our data as well), the SSR mutations do not show the bell-like distribution expected by the stepwise mutation model. In contrast, allelic distribution is similar to the infinite alleles model. MDS and the Reynolds distance was calculated using the R programming language (v. 4.1.3) and the packages poppr and adegenet. Since distances were calculated between populations (and not between individuals), we used the function genind2genpop from the adegenet R package to convert individual genotype data into alleles counts per population.

The Greek Tomato Database

The data firstly were converted to CSV files and imported into the database via the utility phpMyAdmin (www.phpmyadmin.net, version 5.1.0) which has been configured and used as the main tool for the data management. The platform was deployed to a new server and the content is served via an Apache web server (httpd.apache.org, version 2.4.41). All the work was done in a PC with the Linux distribution of Ubuntu version 20.04 LTS as its operating system. The web hosting server also has a Linux Server distribution of Ubuntu Server (ubuntu.com, version 20.04.01 LTS). For the development of the website the Laravel PHP Framework version 7.30.4 has been used. The content of the website is served with the web scripting language PHP version 7.4.3. The database is stored under the MySQL server versbnion 8.0.28.