Dropping Hints : Estimating the diets of livestock in rangelands using DNA metabarcoding of faeces

The introduction of domesticated animals into new environments can lead to considerable ecological disruption, and it can be difficult to predict their impact on the new ecosystem. In this study, we use faecal metabarcoding to characterize the diets of three ruminant taxa in the rangelands of south-western New South Wales, Australia. Our study organisms included goats (Capra aegagrus hircus) and two breeds of sheep (Ovis aries): Merinos, which have been present in Australia for over two hundred years, and Dorpers, which were introduced in the 1990s. We used High-Throughput Sequencing methods to sequence the rbcL and ITS2 genes of plants in the faecal samples, and identified the samples using the GenBank and BOLD online databases, as well as a reference collection of sequences from plants collected in the study area. We found that the diets of all three taxa were dominated by the family Malvaceae, and that the Dorper diet was more diverse than the Merino diet at both the family and the species level. We conclude that Dorpers, like Merinos, are potentially a threat to some vulnerable species in the rangelands of New South Wales.


Introduction
Estimating the diet of an animal by analysing its faeces has many advantages: it is non-invasive and does not require time consuming observation of the animal.Microscopic analysis of plant fragments in animal faeces has long been used to investigate diets (Todd and Hansen 1973) but DNA-based dietary analysis is increasingly utilised in a broad range of species (Höss et al. 1992, Pompanon et al. 2011).These new methods have been found to recover greater diversity in faecal samples than traditional ones (Soininen et al. 2009).Most studies of diet from faecal material have focused on carnivores (Jarman et al. 2013, Shehzad et al. 2012), however, several studies have shown that the technique can work in herbivores (Soininen et al. 2009, Hibert et al. 2013, Symondson and Harwood 2014).In this study, we use faecal DNA barcoding analysis to characterize the diets of economically and ecologically significant Dorper and Merino sheep (Ovis aries) and the goat (Capra aegagrus hircus).
In order to determine an animal's diet using DNA recovered from its faeces, there must exist a reference database of sequences recovered from potential dietary species, against which sequences recovered from faeces can be compared.DNA barcoding (Hebert et al. 2003) is starting to address this requirement, by building reference databases for all species through the sequencing of standardized genomic regions, while adhering to rigorous standards for collateral data and vouchering of the sequenced specimens in museum/herbarium collections (Mitchell 2008).While the mitochondrial COI gene is a very effective DNA barcode standard for animals (Coissac et al. 2016), there is no single candidate marker in plants with the same characteristics.The DNA barcode standard for plants therefore comprises two chloroplast gene fragments, the ribulose-1,5-bisphosphate carboxylase/oxygenase large-chain (rbcL) and maturase K (matK, CBOL Plant Working Group et al. 2009).RbcL evolves slowly and has been widely used to reconstruct deep evolutionary relationships in plants (Chase et al. 1993).
MatK evolves more rapidly and can distinguish about 70% of plant species (Parmentier et al. 2013).However, matK may lack the conserved sites suitable for PCR primer binding (but see Heckenhauer et al. 2016), limiting its utility for barcoding, and currently there are fewer matK than rbcL sequences available in international databases (~175,000 plant rbcL sequences on Gen-Bank against ~150,000 plant matK sequences at the time of writing).Furthermore, chloroplast and mitochondrial DNA is inherited only through the maternal lineage, limiting their utility in detecting hybridization and introgression events, all too common in plants.
To overcome these issues, the ITS2 region of the nuclear ribosomal RNA genes was proposed as a supplementary DNA barcode marker for plants (Chen et al. 2010, Yao et al. 2010).ITS2 has the advantages that the PCR primers are "universal," and the length of the region is only ~230 bp in land plants, with ~95% of the variation between 100 bp and 350 bp (Yao et al. 2010).This means it is usually short enough to allow its PCR amplification from samples with fragmented DNA.Additionally, as part of the nuclear genome, ITS2 evolves independently of rbcL and matK.
Once a database of standard marker sequences from potential dietary taxa has been assembled, the same marker must be recovered and sequenced from faecal samples, where DNA is fragmented and species-mixed.This requirement is being met by high-throughput sequencing (HTS) technologies, which have proliferated since the turn of the century.Compared to the older Sanger sequencing method (Sanger et al. 1977), HTS technologies focus on sequencing shorter DNA fragments, which are joined together by computer analysis to form longer sequences.This enables fragmented template DNA to be sequenced, and different individual organisms' sequences in a sample can be determined simultaneously.There are a number of HTS technologies currently competing for market share, each with their own strengths and weaknesses (Glenn 2011, Goodwin et al. 2016, Levy and Myers 2016).
Combining these two technology-driven fields, DNA barcoding and HTS, gives us DNA metabarcoding (Taberlet et al. 2012), the ability to construct a species list from a sample of environmental DNA.This is well suited to the task of reconstructing sheep diets from the plant DNA present in their faeces.As of 2014, Australia is the world's second largest producer of sheep, after China, although the size of the Australian flock has declined in the last 10 years (Food and Agriculture Organization of the United Nations, 2017).Merino sheep, having originated in Spain, have been present in Australia since the late 18th century and comprise about 75% of Australia's sheep population (Royal Agricultural Society of Tasmania, 2016).A more recent addition to the Australian flock is the Dorper, which was developed in South Africa in the 1930s by crossing the Dorset Horn and Blackheaded Persian sheep.Dorpers were first brought to Australia in 1996(Dorper Sheep Society of Australia, 2017), where it was thought their drought-tolerant physi-ology would be well suited.While the less selective diet of Dorpers relative to Merinos has been previously noted (reviewed in Brand 2000), no faeces metabarcoding study has yet been conducted to investigate this.Additionally, while research has been undertaken on the management of Dorpers in Australia (Alemseged and Hacker 2014), the potential implications of the broad Dorper diet on endangered flora has not yet been investigated.
This project was designed to test the utility and practicality of DNA metabarcoding for dietary analysis of different livestock breeds.We focused on the Merino and Dorper, as well as goats as a control, in south-western New South Wales (NSW).
Our aims were: 1. Use a DNA metabarcoding approach to compare the diets of three ruminants: Merino sheep (Australia's most popular breed, introduced over 200 years ago); Dorper sheep (a recently introduced breed with a less well characterised diet in Australia) and goats (a control) in the rangelands of south-western NSW. 2. To determine whether Dorpers are a potential threat to vulnerable flora.

Study Area
Dorper, Merino and goat faeces samples were collected from 16 different sheep or goat rearing properties in south-western New South Wales, Australia (Suppl.material 1).To minimise environmental variation between Dorper and Merino sites, they were selected in pairs, with the Dorper and Merino sites in each pair being no further than 50km from one another (with the exception of the Aston site, which was an extra unpaired Merino site).Faeces were collected wet within 30mins of being deposited using flame sterilised tweezers, and stored in ethanol for transport.The sampling sites are in the Murray Darling Depression and Darling Riverine Plains bioregions, as determined by the Interim Biogeographical Regionalisation for Australia 7 (IBRA7).These regions have a semiarid climate with predominately winter rainfall.

The reference DNA barcode library and Sanger sequencing methods
We opted to focus on the ITS2 and rbcL markers in this study.RbcL was included because it is a standard plant barcoding gene, and there already exists a large online database of sequences.ITS2 was included as it evolves more quickly than rbcL, increasing its variability, and because it has a different mode of inheritance (nuclear rather than chloroplast).We chose not to focus on matK due to its smaller online sequence database, and also because its high degree of variability in primer binding sites can decrease the likelihood of successful primer binding, particularly in species-mixed samples with high taxonomic diversity (Heckenhauer et al. 2016, Hollingsworth et al. 2011).To identify the rbcL and ITS2 sequences from our samples, we used two online sequence databases: the Barcode of Life Data Systems (BOLD, Ratnasingham and Hebert 2007) and NCBI's GenBank (Benson et al. 2012).BOLD was launched in 2005 as an online sequence database specifically dedicated to DNA barcoding applications, and it is focused on collecting sequences that are BARCODE compliant, meaning they are of high quality and come from well curated voucher specimens (Hanner 2012).
The GenBank and BOLD databases currently have only a small number of sequences from plants found in the study area.We therefore supplemented the BOLD database with sequences from 24 plant reference samples collected from the study sites (Table 1).These species were chosen as they were the most common plants in the study site that were likely targets for browsers.The complete species diversity of the site, however, is unknown, and there may be many species from the study area that remain unbarcoded.
Of the 24 reference samples collected, five could only be identified to genus.These five genera contain species that occur in the study area and are listed in NSW as threatened on at least one of two different lists; the Australian Environment Protection and Biodiversity Conservation Act 1999 national threatened flora list (Department of the Environment and Energy Commonwealth of Australia, 2018b), and the Buloke Woodlands of the Riverina and Murray-Darling Depression Bioregions list (Department of the Environment and Energy Commonwealth of Australia, 2018a), which is specific to the study area.These species are Sclerolaena napiformis (Endangered), Austrostipa metatoris (Vulnerable), A. nullanulla (Endangered) and A. wakoolica (Endangered) on the national list, and Ptilotus erubescens (Rare/Threatened), P. exaltatus var.semilanatus (Endangered), Austrostipa exilis (Rare), A. gibbosa (Rare), A. puberula (Rare), and Sclerolaena napiformis (Threatened/Endangered) on the Buloke Woodlands list.This makes it possible that some of the specimens included in the reference collection are threatened.None of the reference specimens identified to species level are threatened in NSW.Although we did not attempt to sequence matK in the faeces samples, the matK sequences of reference plant samples were included in this study because they could facilitate species identification at a future date.Published PCR primers for rbcL, matK and ITS2 (Table 2), yielding amplicons of approximately 600, 800 and 460 bp respectively, were used for PCR amplification and sequencing of reference plant samples.
For amplifying rbcL and matK, each 15 μL PCR contained: 2 μL of DNA, 0.025 μL MilliQ water, 1.5 μL of 10x reaction buffer at 10 mM, 1.5 μL of MgCl 2 at 25 mM, 0.3 μL of dNTPs at 10 mM, 0.075 μL of Platinum Taq at 5 U/μL and 0.3 μL each of forward and reverse primer at 5 μM.Cycling conditions for rbcL and matK were as follows: 94°C for 2 min; 94°C for 30 s, 55°C for 30 s and 72°C for 1 min x 5 cycles; 94°C for 30 s, 54°C for 30 s and 72°C for 1 min x 30 cycles; and 72°C for 7 mins.The ITS2 amplification followed the same protocol, except that 35 PCR cycles were performed, and the annealing temperature was 55°C.Sanger sequencing was carried out by Macrogen Inc. (Seoul, South Korea) using a high throughput Applied Biosystems 3730XL sequencer.Peak calling and contig assembly was carried out in Geneious 10.0.2 (Kearse et al. 2012).
We estimated phylogenetic trees for all three genes using the sequences recovered from the reference samples, in order to explore the diversity of the three markers among our reference sequences and to check for any sequence clusters from different taxa with low divergence.All three trees were rooted with a single outgroup sequence from the Scots Pine (Pinus sylvestris), downloaded from GenBank.Phylogenetic analysis was carried out in Geneious 10.0.2 using the MrBayes v3.2.6 plugin (Ronquist et al. 2012).DNA substitution models for the matK, rbcL and ITS2 genes were estimated in ModelGenerator v0.84 (Keane et al. 2006), using the model with the minimum negative log likelihood under the Akaike Information Criterion 1. Posterior distributions of parameters were estimated using Markov Chain Monte Carlo sampling.All three analyses were carried out with a chain length of 1,100,000 with samples drawn every 200 steps.The first 100,000 steps were discarded as burn-in.Convergence was tested for by verifying that the standard deviation of split frequencies had dropped below 0.01 in each case.

High-Throughput Sequencing (HTS)
Twenty-four faecal samples were chosen to include an approximately equal number of the three treatments (Meri-no, Dorper or goat) from across the study area (Table 3).Two pairs of faeces samples came from the same sheep: sample 14 and 19 were from the same Dorper, and sample 15 and 20 were from the same Merino.These duplicate samples served as internal controls for among-sample variation in DNA sequence diversity.Overall, the 24 samples comprised eight Merino plus one repeat, seven Dorpers plus one repeat and seven goat samples.DNA extractions were performed at the Australian Museum using both a Bioline Isolate II Plant DNA Isolation kit and a Qiagen DNeasy (Animal) Tissue Kit.
All DNA extractions were quantitated using a Nanodrop spectrophotometer.The Qiagen kit yielded DNA extractions at higher DNA quantity and purity, so these DNA extractions were sent to the Ramaciotti Centre for Gene Functional Analysis (University of New South Wales, Sydney) for further processing.
Published plant DNA metabarcoding studies either have sampled only fresh plant material to generate large PCR amplicons for DNA sequencing, or have utilised alternative chloroplast genes, such as trnL, for which tried and tested primers yielding small fragments were available (Valentini et al. 2009).Short amplicons are favoured in faecal metabarcoding studies as DNA of dietary organisms is fragmented during digestion, and short markers are more likely to be successfully amplified in the fragmented DNA (King et al. 2008).No published primers could be found that targeted amplicons of the appropriate sizes (<300 bp for faecal samples, King et al. 2008) for the plant DNA barcode standard genes, matK and rbcL.Therefore a new PCR primer (rbcL-AM2f) was designed to amplify a 247 bp fragment of rbcL, when used in combination with one of the standard barcode primers, rbcLajf634r (Fazekas et al. 2008).RbcL-AM2f was designed manually in Geneious, using rbcL sequences from the 24 reference samples as a reference.The primer was tested with the rbcLajf634r reverse primer on the 24 reference samples and the fragment amplified successfully in all of them.The source, sequences and expected amplicon size of the PCR primers utilised for the HTS part of this study are given in Table 2. HTS was performed by the Ramaciotti Centre using the Illumina MiSeq platform.The Nextera XT 24-sample preparation kit was used because it facilitates indexing of 24 separate samples, i.e. each sample is labelled with a unique DNA sequence "index" which allows all the sequences from a particular sample to be separated after the sequencing run.The Illumina MiSeq Nano version 2 sequencing kit was used, allowing sequencing of 250 bp from each end of an amplicon.As both amplicons were less than 500 bp long, this allowed for bidirectional sequencing of each template, ensuring that high quality sequence data was obtained.

HTS data analysis
The Illumina high-throughput sequencing output consisted of 1,110,113 forward and reverse reads over the 24 samples, averaging 46,255 reads per sample (number per sample can be found in Suppl.material 4).Raw Illumina reads were imported into Geneious 10.0.2, filtered for length (>35 bp) and ends of sequences with stretches of Ns or quality lower than 14 were trimmed.In Geneious, bidirectional reads were merged, and any reads that failed to merge were discarded.Reads were then separated into rbcL and ITS2 reads by searching for ITS2 primer sequences within the reads in Geneious.The separated reads were then pooled across all samples and screened for Chimeras using the de novo uchime2 algorithm implemented in USEARCH v9.2.64 (Edgar et al. 2011).
The non-chimeric reads were assembled into contigs in Geneious.Sequences had primers trimmed with a maximum of 2% gaps and maximum 2 bp gap size.Maximum mismatches per read was 2% for ITS2 and 1% for rbcL, due to the higher degree of variation expected in the ITS2 gene (Chen et al. 2010).Following assembly, rbcL contigs with a length longer than 300 bp and ITS2 contigs with a length shorter than 300 bp were excluded.Of the ITS2 contigs, 77.1% of those excluded for being under 300 bp were also under 50 bp in length.As they matched well with primer sequences they were likely primer dimers.Five (or where there were fewer than five, all) of the remaining 22.9% of excluded ITS2 contigs from each sheep were BLASTed to determine what they matched to.In all cases, these sequences either had no match, or matched to ribosomal RNA sequences not in-cluding ITS2, or overlapped less than 100 bp with an ITS2 sequence.We therefore excluded these sequences as none of those tested contained complete ITS2 sequences.ITS2 contigs were screened for 5.8S and 28S sequences, using the 'Annotate from Database' function in Geneious.Annotations were made using an annotated plant sequence downloaded from GenBank, from the species Canella winterana (sequence L03844.1) with annotations made at 40% similarity (counting ambiguities as mismatches).Annotated sections were then removed, leaving only the ITS2 sequence.Contigs with a read depth of less than 10 were excluded in both genes in the main analysis, following the 10-minimum read depth protocol of Hibert et al. (2013) and Gebremedhin et al. (2016).We also carried out a supplementary analysis with all the same conditions but at a minimum read depth of 3.

Sequence Identification
Sequence identification using the GenBank database was conducted online, using BLASTN 2.6.0 (Zhang et al. 2000).Identification using BOLD was conducted using the MegaBLAST function in Geneious, searching a downloaded database of all BOLD ITS2 and rbcL sequences that was supplemented by our reference database sequences.
Match data from GenBank, consisting of the 100 closest matches for each contig, were then imported into ME-GAN6 Community Edition v6.7.0 (Huson et al. 2016) for the assignment of the lowest common ancestor (LCA) of the best matches.The default LCA parameters were used, however, the "minimum percent identity" was changed to 1% in ITS2 and 0.005% in rbcL, to account for the relatively higher inter-specific variation in ITS2 (Chen et al. 2010).Match data from BOLD consisted of the 100 closest matches, which was then restricted to matches with 100% query coverage and a minimum percentage identity, 98% for ITS2, 99% for rbcL, again to account for the more variable ITS2.Identifications were then assigned for each contig based on the lowest taxonomic level in common, up to family, among all their matching database sequences.This ensured that sequences that could not be identified to species could still be identified to family, but those sequences not identifiable to family were excluded from further analysis.
All sequences identified from each database were then collated into an alignment, one for BOLD identifications and one for GenBank, and neighbour-joining trees produced using the Geneious Tree Builder, with default settings.These trees were used to help determine the identity of OTUs that received no hits in either the BOLD or GenBank databases.
IDs assigned to taxa were compared against the Atlas of Living Australia (ALA) website (Atlas of Living Australia 2017).If the assigned species did not occur in the study zone, or within 500km of it, the ID was changed to 'Genus sp.', and all such instances are underlined in our results tables.We close to include these matches to taxa that occurred outside the study zone, rather than exclude them entirely as in Sugimoto et al. 2018, as the reference database for our study area is not complete and we felt that matches outside the study area that were congeneric with plants within it were therefore still worth noting.If no member of the genus occurred in the study area, the sequence was only identified to family (this only occurred once, in the BOLD rbcL data).

Sheep Diet Analysis
We performed statistical tests on the GenBank sheep diet data to investigate similarities and differences between the breeds.We performed two-tailed t-tests in R v3.3.3 (R Core Team 2017) between the Dorper and Merino samples, based on the family and species diversity in each sample.Data was tested for normality and equality of variance using the Shapiro-Wilk test in R, and Levene's test in the R package 'car' v2.1-6 (Weisberg and Fox 2011).We also performed two ANOSIMs at the family level using the package vegan (Oksanen et al. 2017) in R, one using proportional read data and one using presence/ absence data, in both cases using the breed as the independent variable, to test for differences in the proportion of plant families in the diets of the three taxa.We also calculated the Sørensen similarity index for all three pairs of taxa at the species level.Finally, we used the Kruskal-Wallis test implemented in the R to compare the total number of contigs to the number of contigs that could be identified to species in each taxon.

Data resources
The metabarcoding datasets generated and analysed during the current study are available at Figshare: http:// doi.org/10.6084/m9.figshare.5309935 The plant reference database generated and analysed during the current study is available at BOLD: http://doi.org/10.5883/DS-SMPRS

DNA barcode reference library
Sequences were obtained for the rbcL, matK and ITS2 genes for 18, 16 and 15 of the 24 plant reference samples respectively.
Phylogenetic trees were derived from the slower evolving rbcL gene (Fig. 1) and the faster evolving matK gene (Fig. 2).There were several groups of identical sequences from samples identified morphologically as different species, in rbcL (T10 and T17; T03 and T08, and T11 and T24) and matK (T01 and T10).MatK provided better resolving power than rbcL, however matK sequences were also unable to distinguish clearly between all samples.A phylogenetic tree was also derived from the nuclear ITS2 gene (Fig. 3).In the ITS2 dataset, no sequences were identical, which might make this marker more useful in identifying sequences to species.In all three trees, specimens identified morphologically as from the same family clustered together with maximum posterior probability in all cases except Fabaceae in rbcL, where it was 0.92.

High-throughput sequencing of faecal samples
De novo chimeric sequence detection using the UCHIME algorithm flagged 21.4% of reads as chimeric in ITS2 and 42.1% as chimeric in rbcL.Following their removal, the non-chimeric reads were assembled into contigs.These ranged from 66-761 contigs per sheep for ITS2 and 119-1060 contigs per sheep for rbcL.
A number of matches on both the GenBank and BOLD databases, particularly in the rbcL gene, were for unicellular algae that were likely present in the diet as water contaminants.In all analyses, algal sequences were excluded.In the following, singletons (species or families appearing in only one sample) were also excluded.Some specimen identifications were changed to "Genus sp." based on available information about taxon ranges from ALA.This accounted for the incompleteness of the reference databases, so that matches to species that did not occur within the study zone were instead listed as matching only to genus level.All such cases are clearly marked in Table 4 and Suppl.materials 6, 13, 15.In one case, Aethionema sp. in the BOLD rbcL data, the genus did not occur in the study area and the match was changed to 'Brassicaceae species'.

GenBank Results
Generally, Dorpers were found to have more diverse diets than Merinos.Based on the GenBank results, at the family level, there were 11 families present in the diets of both breeds, three families found in Merinos not Dorpers, and seven families found in Dorpers not Merinos (Table 5).At the species level, there were 15 species in common between the breeds, six species found in Merinos not Dorpers and 12 species found in Dorpers not Merinos (Table 4).At the family level, the goat diet includ-ed 13 families, which was fewer families than in either Merinos or Dorpers.One of these families was unique to goats, two were shared only with Dorpers, and ten families were common to all three taxa.At the species level, the goat diet was also less varied than either sheep, having only 18 species present.Four of these species were unique to goats, two were shared only with Merinos, two were shared only with Dorpers, and 10 species were common to all three taxa (Table 4).Based on the GenBank results, the Sørensen similarity index, at species level between the Merinos and Dorpers was 62.5%.Between Dorpers and goats it was 53.3% and between Merinos and goats it was 61.5%.This indicates that although the Dorper diet is more diverse than Merino diet, those two diets are more similar to one another than either is to the goat diet.

Threatened Species
The list of species found in the Dorper and Merino faeces based on Genbank results was compared to the national list of Australian threatened taxa (Department of the Environment and Energy Commonwealth of Australia, 2018b), and the list of threatened taxa in the Buloke Woodlands region, an endangered habitat in the Riverina and Murray-Darling depression bioregions occurring in and around the study site (Department of the Environment and Energy Commonwealth of Australia, 2018a).Threatened taxa found in the diets of sheep were Minuria cunninghamii (Dorper only, listed as 'rare') and Sida fibulifera (Dorper, Merino and goat, 'vulnerable') which are threatened in the Buloke Woodlands region.These were species-level matches, but there were also some genuslevel matches to threatened species, for sequences identified only to genus level.On the national list, these were Sclerolaena napiformis (Merino only, 'endangered'), Calotis moorei (Dorper, Merino and goat, 'endangered'), and Maireana cheelii (Dorper and Merino, 'vulnerable').On the Buloke Woodlands list, these were Sclerolaena napiformis (Merino only, 'threatened/endangered'), Maireana cheelii (Dorper and Merino, 'vulnerable'), M. excavata (Dorper and Merino, 'vulnerable') and M. rohrlachii (Dorper and Merino, 'rare').

BOLD Results
BOLD database searches yielded very similar results to the GenBank results (Suppl.materials 5, 6).At the family level, there were eight families present in the diets of both sheep breeds, one family found in Merinos not Dorpers, and five families found in Dorpers not Merinos.At the species level, there were 12 species in common between the breeds, four species found in Merinos not Dorpers and seven species found in Dorpers not Merinos.At the family level, goats were found again to have lower diversity than either Merinos or Dorpers, with seven families present.One of these families was unique to goats, three were shared only with Dorpers, and seven families were common to all three taxa.At the species level, the goat diet also was less varied than either sheep, having only seven species present.None of these species was unique to goats, one was shared only with Merinos, one was shared only with Dorpers, and five species were common to all three taxa.

Species and Family Accumulation Curves
To estimate the completeness of taxon sampling, taxon accumulation curves at the family and species level were constructed for each taxon separately based on GenBank results.The species and family accumulation curves indicate that more samples are needed to estimate the total diversity of Dorper (Suppl.material 7), Merino (Suppl.material 8) and goat (Suppl.material 9) diets, except in two cases-the goat and Merino family-level graphs both asymptote at 13 and 14 families respectively.This indicates that while our sampling of Merinos and goats was probably sufficient to estimate family-level diversity, family-level diversity for Dorpers and species level diversity in all taxa were likely underestimated.

Diet diversity in individual sheep
Dorper and Merino diet data at the species and family levels were found to be normally distributed and with equal variance (Suppl.material 10), and t-tests were therefore selected as appropriate to compare their means.
The diversity of the individual Dorper and Merino diets were not significantly different at the species level (t = 0.713, df = 15, p-value = 0.487) nor at the family level (t = 0.579, df = 15, p-value = 0.571).
Among the breeds, the ratio of the total number of contigs to the number of contigs that could be identified to species in each individual was compared.For the ITS2 data, the Kruskal-Wallis test was used, as the data failed the Shapiro-Wilk test for normality, while the ANOVA was used for the rbcL data as this data passed normality and equality of variance tests (Suppl.material 11).In both cases there was no significant differences among the breeds (ITS2, Kruskal-Wallis test, χ 2 = 0.327, df = 2, p-value = 0.849; rbcL, ANOVA, df = 2, F value = 0.085, p-value = 0.919, residual df = 21).

Proportion of Reads in Diet (at family level)
Previous studies have used the proportion of reads from different families recovered in faecal metabarcoding as a proxy for the quantity of different foods in the subjects' diets (Kartzinel et al. 2015, Lopes et al. 2015, Soininen et al. 2015).This analysis was based on the ITS2 GenBank data, as it exhibited the highest diversity.
Our results are highly variable among individuals, but Malvaceae appears to be the dominant family in all three taxa, accounting for between 11% and 34% of the reads (width of standard error, Fig. 4).Amaranthaceae was the next most prevalent among sheep, accounting for between 12% and 28% of the reads, but only 4-8.5% in goats.Standard error was high among samples, indicating the high level of variability in the diet among individuals.ANOSIMs were non-significant using both proportional (R = -0.026,p = 0.632) and presence/absence data (R = -0.061,p = 0.852).The R values close to zero indicate that the most similar samples were found both within-groups and among-groups.

Supplementary analysis at minimum read depth of 3
Decreasing the minimum read depth to 3 increased the number of taxa recovered from the diets of all three study organisms.In the GenBank analysis at a minimum read depth of 3, the sheep have 18 species in common, 12 unique to Merinos and 15 unique to Dorpers; in the BOLD analysis at a read depth of 3, the sheep have 20 species in common, 6 unique to Merinos and 15 unique to Dorpers (Suppl. materials 12,13,14,15).These analyses were similar to the 10 minimum read depth analysis, in that Dorpers exhibited the greatest diet diversity, followed by Merinos and then goats.

Discussion
Both species and family level results show that, when considered at the level of the breed, in aggregate the diet of Dorpers is more varied than that of the Merinos, with family and species level diversity being 29% and 24% higher in Dorpers respectively, based on GenBank results.However, t-test results show that, at the individual sheep diet level, Dorpers do not have a more varied diet than Merinos.This result means that, on average, while an individual Dorper does not have more plant taxa in its diet than an individual Merino, the diet of Dorpers varies more from individual to individual than does that of Merinos.This is consistent with a behavioural model in which Dorpers eat whatever food is easily available without discriminating, while Merinos are more likely to seek out a more restricted set of food plants.This corroborates the results reviewed by Brand (2000) who found that "The Dorper… utilised a larger number of different plant species than Merinos", and that "Dorpers walked less to select food, or a suitable spot to graze".Goats were found to have a less varied diet than either breed of sheep, and the two sheep breeds were found to have a more similar diet to one another than either do to goats, as goats are not eating many of the species that Merinos and Dorpers have in common.
Threatened taxa were found in the diets of both Merinos and Dorpers.Species level matches were found for two taxa in the Dorper diet (Minuria cunninghamii and Sida fibulifera) and one taxon from the Merino diet (S. fibulifera).Genus level matches were also found for five species in the Merino diet, and four species in the Dorper diet, indicating that the diets of these sheep may include more threatened taxa.This highlights the need for barcode sequences from threatened taxa to be uploaded to online databases to facilitate their detection in metabarcoding studies.
The rbcL, matK and ITS2 sequences could only be successfully amplified in a subset of reference samples.Poor DNA preservation in the plant tissue samples may have been a factor in PCR failure, since trial PCRs of the small amplicons designed for Illumina sequencing were successful for rbcL with all 24 samples.However, possible primer-template mismatch cannot be discounted either.
There were several instances of zero or low variation between reference samples from different species, although this is not uncommon in plant DNA barcoding studies (Hollingsworth et al. 2011).This low variation may indicate these samples were misidentified morphologically, and are really of the same species.To assess this possibility each sequence was used in a BLAST search of the BOLD and GenBank databases to find their closest matches (Suppl.material 2, Suppl.material 3).Unfortunately, the BOLD database currently contains relatively few plant sequences (95,000 for rbcL, 97,000 for ITS2 and 70,000 for matK, versus almost 5 million COI barcodes for animals) and none of the genera sampled in this study have yet been sampled comprehensively, which complicates interpretation of the data.Examination of Suppl.material 2 suggests that some identifications may have to be reassessed, e.g. the notoriously difficult Austrostipa vs. Nasella.However, many of the plant sequences on BOLD are draft sequences that have not yet been published, and also may have been incorrectly identified (e.g.Chenopodium ww08577, an rbcL match on the BOLD database, is an unpublished sequence which is likely to actually be a species of Atriplex based on a number of very close matches to Atriplex sequences in BOLD).This identification also accords with our samples T24 and T11, identified using morphology as Atriplex species, and for which both rbcL and ITS2 data matches Atriplex on BOLD.Sequencing of expertly-curated herbarium material is needed to establish firmly both the identity of each species and whether a supplementary marker can reliably distinguish these species.
Although the proportion of reads in the diet metabarcoding dataset is unlikely to match exactly the proportions by volume of plant matter in the diet, there is a relationship between the two (Thomas et al. 2013, Batovska et al. 2017).Using proportions of reads as a proxy for diet content has been used previously in herbivores (Ando et al. 2013).The proportional read data indicates that the three taxa have a similar 'core' diet at the family level, with 55% of the reads being accounted for by Malvaceae, Fabaceae, Amaranthaceae and Aizoaceae in all three taxa.Malvaceae was the most common family in the diet of all three taxa with >20% of the reads in each case.While Dorpers do eat a more diverse diet than Merinos, the additional taxa that they are eating do not form a large part of their diet.The eight families unique to Dorpers together account for only 8-19% of the diet, less than the single most prevalent family, Malvaceae.Brand (2000) reviewed a range of studies that indicated that grasses account for a large part (if highly variable, from 10-75%) of the Merino and Dorper diet, and a greater part of the Merino diet relative to Dorpers.In the present study, grasses accounted for only 2-6% of the reads in the diets of all three taxa, with all three within standard error of one another.
The goat diet was found to be less diverse than the diets of either sheep, and broadly in line with results of previous studies.Goats are known to consume more woody plants and fewer herbaceous plants than sheep (Castro andFernández Núñez 2016, Bartolomé et al. 1998).In this study, when compared to the sheep, goats were found to consume more plants from the Myrtaceae family of woody plants (although this varied widely among individual goats) and less from the family Amaranthaceae, which are mainly herbaceous.
The range of plants available in the study area at the time the samples were taken may go some way towards explaining these results.The region received more than double the median rainfall during the two years preceding the collection of the dung samples (2010 and 2011).For this reason, plant growth and vegetation diversity were very high, with both Malvaceae and Poaceae being abundant.Merinos may therefore have been able to select only their preferred species as they are also known to consume some of the species found in Dorper-only dung when their preferred species are not present, such as during drought conditions.
It should be noted that, while our proportional reads analysis based on GenBank data showed that Malvaceae was the largest component of the diet of all three taxa, Malvaceae was almost entirely absent from the BOLD results.This was due to the difference in analytical approach between the two parts of our study.The Malvaceae component of the diets of all three taxa mainly comprised species of the genus Sida.The Sida sequences on BOLD were shorter than the contigs in our dataset, and as they had less than 100% query coverage, under the method we used in our BOLD analysis those matches were excluded.Matches to Abutilon, which was less common, did result in a BOLD match.Care must be taken in interpreting metabarcoding results, as difficulties arise when one is working with reference databases where some but not all of the species that occur in the study area are represented (Zepeda Mendoza et al. 2015).In that case, when one searches the barcode library with a query sequence, the closest matching sequence in the library (even if it is an exact match for a slowly evolving gene like rbcL) may not be the same species, or even a congeneric species.Indeed, Wilkinson et al. (2017) caution that the proportion of species-unique rbcL + matK barcodes is likely to drop even further as sampling increases.To make a judgement call about species identity, one has to know what closely related species are found in the sampling area, and whether those species are represented in your reference library.
Duplicate samples were not highly similar.Based on the GenBank data, the two repeated Dorper samples had 10 species in common, and 18 species possessed by only one sample (Sørensen similarity index = 52.6%).The two repeated Merino samples had 5 species in common, and 11 species possessed by only one sample (Sørensen similarity index = 47.6%).This highlights the value in taking multiple samples from a single individual, as diets are likely to be highly variable.This has been observed previously in another herbivore, the Pacific pocket mouse (Iwanowicz et al. 2016).Only a single sample was taken from each faecal pellet-it might also be useful to compare subsamples from a single pellet to determine its degree of consistency.Regarding the source of the duplicate dung samples, while we are confident that each sample in a vial is of the same sheep, we cannot be absolutely certain of this as there is a small chance that two or more animals in a mob could have dropped their dung at the same spot.Additional sampling, and the use of degenerate primers like the rbcL-AM2f primer used in this study, might also help to overcome PCR amplification bias, which can skew the proportion of reads from different taxa in metabarcoding HTS results (Krehenwinkel et al. 2017, Pawluczyk et al. 2015).In our supplementary study increasing the sensitivity of our analysis by lowering the minimum read depth to 3, the number of families and species recovered in Merinos, Dorpers and goats were higher then when using a minimum read depth of 10, but the relative levels of diversity among taxa were maintained.
As sequencing technologies become cheaper and provide more data, the bottleneck in metabarcoding studies becomes whether there exist complete and reliable databases of sequence data from target organisms in the study area.Diet metabarcoding studies on Australian herbivores would be greatly improved by increased taxon sampling and sequencing of Australia's flora.Our results show that much of the diet of the sheep and goats in the area was not covered by our small reference collection, as few barcode IDs matched to our reference database.Incomplete reference collections remain a challenge in carrying out metabarcoding studies (Zepeda Mendoza et al. 2015), and a reliable and complete database of Australian plant taxa would have greatly improved the accuracy of our identifications.In particular, such a database should include those plants listed on threatened species lists to facilitate the identification of herbivores that are potential threats.
Plant barcoding remains challenging for several reasons.Firstly, while the COI gene in animals, being variable, easily amplified and easily aligned, is an excellent standard barcoding region, no single such region exists in plants.There is some diversity of opinion on what plant barcode region is most useful, and most researchers use more than one region at once (Hollingsworth et al. 2016).This inconsistency makes it more difficult to develop and use online sequence databases to identify plant sequences.Including a more variable gene such as trnL in this study might have enabled us to recover more matches in the faecal samples, however, more complete trnL reference databases are needed to facilitate the identification of recovered sequences.
Genome skimming is a relatively new technique that has the potential to circumvent some of the present difficulties in plant metabarcoding (Dodsworth 2015).First introduced by Straub et al. (2011) in the context of plant systematics, genome skimming provides much more sequence data than barcoding, and might represent a new direction in identifying plant samples from molecular data (Hollingsworth et al. 2016, Wilkinson et al. 2017).

Conclusions
The diet of Dorper sheep was found to be more diverse than the diet of Merino sheep.Although the diet of Dorpers generally was more diverse, individual Dorper sheep do not seem to eat more taxa than Merino sheep, but they do appear to be less selective.The diets of the two sheep breeds were more similar to one another than they were to the goat diet.The proportions of different plant families present in the diets of all three animals show a "core" diet of four plant families, common to all three taxa and accounting for over 55% of the reads.The diversity in the Dorper diet is mainly accounted for by taxa consumed in low quantities.More sampling is required to get a full picture of the diversity of these animals' diets.
Our preparation of a small reference database, and comparison of this database against the online BOLD and GenBank databases, emphasizes that online databases need to be more complete to get a more accurate picture of diversity present.A more complete database would also increase the usefulness of the ITS2 dataset, making it generally more useful than rbcL for species-level discrimination.
Finally, Dorpers might be a threat to the vulnerable plant species Minuria cunninghamii and Sida fibulifera, with Merinos also being a potential threat to S. fibulifera.The diets of both sheep breeds also contained taxa identified only to genus, which might potentially include threatened taxa.This highlights that Dorpers could potentially have as much of an impact as Merinos on threatened species.

Figure 1 .
Figure 1.Bayesian tree of rbcL gene sequence data.Numbers at nodes are posterior probability values.Scale bar indicates substitutions per site.The tree was rooted with a single Pinus sylvestris sequence downloaded from GenBank (accession number AB097775.1).

Figure 2 .
Figure 2. Bayesian tree of matK gene sequence data.Numbers at nodes are posterior probability values.Scale bar indicates substitutions per site.The tree was rooted with a single Pinus sylvestris sequence downloaded from GenBank (accession number AB097781.1).

Figure 3 .
Figure 3. Bayesian tree of ITS2 sequence data.Numbers at nodes are posterior probability values.Scale bar indicates substitutions per site.The tree was rooted with a single Pinus sylvestris sequence downloaded from GenBank (accession number KX167560.1).

Figure 4 .
Figure 4. Proportion of reads of each plant family in the ITS2 data from each taxon, based on matches from GenBank.Error bars represent standard error.

Table 1 .
Plant samples used for DNA barcode library construction.Botanical name is the initial identification before DNA sequencing.See DNA barcode results in Suppl.material 2 and Suppl.material 3.

Table 3 .
Sheep faecal pellet samples used.Locations of properties can be found in Suppl.material 1.

Table 4 .
Species level taxa (GenBank Data).Names of taxa with asterix were changed based on the distribution of taxa in the study zone.