Development and assessment of a new method for combining catch per unit effort data from different fish sampling gears : multigear mean standardization ( MGMS )

Fish community assessments are often based on sampling with multiple gear types. However, multivariate methods used to assess fish community structure and composition are sensitive to differences in the relative scale of indices or measures of abundance produced by different sampling methods. This makes combining data from different sampling gears and methods a serious challenge. We developed a method of combining catch per unit effort data that standardizes catch per unit effort data across gear types, which we call multigear mean standardization (MGMS). We evaluated how well MGMS and other types of standardization reflect underlying community structure through a computer simulation that generated model riverine-fish communities and simulated sampling data for two gears. In these simulations, combining sampling observations from two gears with MGMS produced community structure estimates that were highly correlated with true community structure under a variety of conditions that are common in large rivers. Our simulation results indicate that the use of MGMS to combine data from different sampling gears is an effective data manipulation method for the analysis of fish community structure. Résumé : Les évaluations de communautés de poissons reposent souvent sur un échantillonnage fait avec différents types d’engins. Les méthodes multivariées utilisées pour évaluer la structure et la composition de communautés de poissons sont toutefois sensibles aux différences d’échelle relative des indices ou mesures d’abondance produits par différentes méthodes d’échantillonnage, de sorte que le groupement de données issues de différents engins et méthodes d’échantillonnage constitue un important défi. Nous avons mis au point une méthode pour combiner des données de prises par unité d’effort qui normalise les données de prises par unité d’effort issues de différents types d’engins, méthode que nous appelons la normalisation moyenne multiengins (NMME). Nous avons évalué la mesure dans laquelle la NMME et d’autres approches de normalisation reflètent la structure sous-jacente des communautés en utilisant des simulations par ordinateur qui ont généré des communautés modèles de poissons de rivière et simulé des données d’échantillonnage pour deux engins. Dans ces simulations, le groupement d’observations d’échantillonnage issues de deux engins à l’aide de la NMME a produit des estimations de la structure des communautés fortement corrélées à la structure réelle des communautés dans différentes conditions couramment observées dans les grandes rivières. Les résultats des simulations indiquent que l’utilisation de la NMME pour combiner des données issues de différents engins est une méthode de manipulation de données efficace pour l’analyse de la structure des communautés de poissons. [Traduit par la Rédaction]


Introduction
For the vast majority of aquatic ecosystems, no single sampling method is adequate for assessing all species and life stages of fishes (Weaver et al. 1993;Willis and Murphy 1996).Most sampling gears for fishes are selective for certain size ranges or species and are not equally effective in all habitats (Hayes 1996;Hubert et al. 2012;Chick et al. 1999).As a result, the use of multiple gears is common in surveys of fish communities across a range of aquatic ecosystems, such as lakes (Weaver et al. 1993;Fago 1998), rivers and streams (Peterson 1989;Lapointe et al. 2006;Chamberland et al. 2014;Broms et al. 2016), estuaries (Rozas and Minello 1997), or coral reefs and mangroves (Acosta 1997).Despite the fact that many surveys and monitoring programs use multiple gear types to sample fishes, the resulting data are usually analyzed separately for each gear.
The necessity of using multiple gear types to adequately sample a fish community is particularly true when these data are used to assess fish community structure with similarity or dissimilarity based techniques such as analysis of similarity (ANOSIM) or nonmetric multidimensional scaling (Clarke 1993;Mumby et al. 2004;Chick et al. 2006).Data from surveys are usually reported as catch per unit effort (CPUE), rather than a direct measure of abundance.Data for the same species in different gear types usually will have different measures of effort, and the values of CPUE and the scale of CPUE may differ greatly between gears.If differences in the scale of CPUE data among gear are not standardized, then an analysis of a combined index will be dominated by the gear with the greatest mean values of CPUE.Combining data from multiple gears would be less problematic if CPUE data could be converted to numbers or biomass per unit area, but the statistical relationship between CPUE and true abundance, known as catchability, has yet to be developed for many gears, habitats, species, or life stages and size classes of fishes (Arrequin-Sanchez 1996; Richards and Schnute 1986; but see Lauretta et al. 2013).
For studies focusing on population trends or stock assessment of individual species of fish, there are several examples of methods to combine information from multiple sampling gears.A straightforward method is to sum CPUE data from multiple gears (e.g., Hinch et al. 1991).Combined CPUE data from multiple gears has been referred to as "standardized catch per unit effort (sCPUE)", with standardization being a combination of either extrapolating data such that effort in time for different gears is equal or by assuming that samples from different gear have equal effort despite differences in sampling time among gears (Phelps et al. 2009(Phelps et al. , 2014(Phelps et al. , 2015)).In general, however, there is no accepted justification for standardizing by the amount of fishing time when combining an active method, such as electrofishing, with a passive method, such as hoop nets.Marine stock assessment studies often create a new index of the abundance of the population of interest by combining data from multiple fishing gear after data for each gear are placed on a relative scale (e.g., by setting the mean or maximum value of each gear to 1 or 100; Conn 2010).Conn (2010) assessed such approaches and made recommendations for adjusting for differences in sampling variance among gears.When researchers are primarily interested in assessing patterns in fish community structure through time and (or) among locations, the task of combining data from multiple gears is more complex because community structure data are a combination of variation in the abundance of each species across observations (i.e., variation in absolute abundance) and variation in the abundance of each species within observations (i.e., relative abundance or species dominance).
Standardization techniques are available that convert data either among species within an observation or among observations within each species.Weaver et al. (1993) converted CPUE data from multiple gears to relative abundance (i.e., among species within an observation).Converting CPUE data to relative abundance successfully standardizes CPUE data from multiple gears to the same relative scale, but also creates different patterns among observations within a species compared with the original CPUE data.Data sets of CPUE from different gears could be put on the same relative scale by (i) normalizing each data set within species and across observations to mean = 0 and variance = 1, (ii) dividing the CPUE for each species in each observation by the maximum CPUE for that species across all observations, a method we term SpeciesMax, or (iii) setting the mean of each species among samples to an arbitrary value (e.g., 1 or 100).The drawback to all of these standardizations is that the patterns of relative abundance among species within an observation will be altered.Additionally, rare species become equally weighted with abundant species, which may or may not be desirable depending on the specific goals and objectives of the analysis.For example, if CPUE data from multiple gears are combined using differential weighting (e.g., adjusting for the total area of habitat available to be sampled by each gear), the abundance patterns between species in an observation should be maintained to prevent confounding the overall weighting for each gear.An additional potential drawback to SpeciesMax standardization or converting CPUE to relative abundance is that both adjustments result in data restricted to a closed ratio (i.e., 0 ≤ x ≥ 1), which in some instances creates artificial correlations among species (Jackson 1997).
We devised a method to combine data from multiple gears that preserves both patterns of relative abundance among species within observations and patterns of abundance within species among observations.This method standardizes data from multiple gears using a mean centering method, so we refer to this method as multigear mean standardization (MGMS).We used a simulation model to create communities of 18 species of fish and simulated sampling these communities using electrofishing and mini-fyke nets.Our goal was to compare the accuracy of four methods of combining CPUE data (MGMS, sCPUE, relative abundance, and SpeciesMax) using parameters that reflect the distributions of large river fishes and an analysis framework consistent with multivariate techniques based on similarity-dissimilarity measures of community structure.

MGMS
Calculating MGMS begins by standardizing the CPUE data for each gear using a form of mean centering.(See online supplementary material1 , File S1 "MGMS_Calculations.xlsx"for an example of calculations.)First, the total catch (TC) of all i species in each observation j per unit of effort e is calculated as (TC j /e).Next, for each gear, the mean total catch per unit effort (TC ¯j/e) is calculated.To standardize the data for each gear, the CPUE of species i in observation j (c ij /e) is divided by the mean total catch per unit effort across all observations, yielding: (1) where MSC ij is mean standardized catch of species i in observation j.Note that the units of effort e associated with the gear are cancelled out in the calculation of MSC ij , and both the patterns of relative abundance of species within an observation and relative abundance of a species across observations are preserved.See File S2 1 for details of how eq. 1 was derived.Once CPUE data for each gear are converted to MSC ij , they can be combined across gears (e.g., MSC fyke + MSC electrofishing ), and the resulting sums provide the basis for multivariate analysis.For example, the simplest method of combining data from multiple gears would be simply to add MSC ij data from each gear together for each species from each sampling location or time (sensu Hinch et al. 1991;Jackson and Harvey 1997).

Model overview
Our objective in creating our model was to allow a comparison of the efficacy of MGMS standardization compared with other standardization methods in the context of analyzing community structure with similarity-or dissimilarity-based multivariate techniques.All modeling and analysis was conducted using SAS for Windows (SAS Institute, Inc. 2013).We began by creating known fish communities so that we could compare different methods of combining sampling gears against true values.We wanted these known communities to emulate real communities of fish, so we used the Upper Mississippi River System (UMRS) as a template.Six reaches of the UMRS are sampled extensively with multiple gears by the Long Term Monitoring element (LTRM) of the US Army Corps of Engineers Upper Mississippi River Restoration Program (Ratcliff et al. 2014).The primary goal of LTRM fish monitoring is to generate reach-wide means of CPUE across multiple habitat strata for each sampling gear using a stratified-random sampling design.
We generated known fish distribution patterns for 18 species across 21 river reaches and generated sampled fish distribution patterns for two gears in six of these reaches.In the UMRS, not all species occur in all six of the LTRM study reaches, and we wanted the model to simulate realistic distributions of species across river reaches.To be sure our simulated data would reflect both common and less common species in the UMRS, we examined distribution patterns for 56 species of fish that were captured in at least 8 out of the 10 years (1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003) of LTRM monitoring.Of these 56 species, 64% occurred in all six reaches, 18% occurred in four or five, 14% occurred in two or three, and 4% only occurred in one.We randomly generated 100 beta probability distributions defined on the interval (0, 1) across the 21 simulated river reaches and constrained the curves such that the distribution across the six reaches sampled would match the percentages listed above.We used the beta function (D = x ␣ (1 -x) ␥ ), with random combinations of ␣ and ␥ to produce realistic distributions, including both symmetric and asymmetric distribution curves (Fig. S1 1 ).The types of response curves were not selected to favor any form (e.g., symmetric, skewed, plateau, etc.) because true response curves of fishes are rarely known.
Each of the 18 species had a probability distribution pattern that was defined by randomly selecting one of the 100 beta curves, which we then converted from probability values (0-1) to abundance values (kg•ha −1 ) using a simulated maximum biomass for each species.The maximum biomass of each species was modeled using a gamma probability distribution (rangam function in SAS) so that there would be fewer species with high abundance and more species with low abundance (Fig. S2 1 ).We constrained the peak biomass from the gamma distributions such that the sum of all 18 species at a river reach would average 400 kg•ha −1 , the global mean biomass of fishes in large rivers (Welcomme 1985; Fig. 1).

Generating sampled distributions
Our analysis is based on comparing two sampling gears for fishes that produce CPUE data on different scales.We assumed that the two gears would be used independently in each of the six sampled reaches to generate reach-wide means of CPUE for each species.Sampling means for electrofishing and mini-fyke netting were generated in six of the 21 simulated river reaches.Catchabil-(q; standardized as CPUE) varies between electrofishing and mini-fyke nets, and we assumed an order of magnitude difference in q between gears for these simulations.We further structure q for both gears types such that one-third of the species were sampled well, one-third moderately, and one-third poorly, and we assumed that species sampled well by one gear were poorly sampled by the other gear (Fig. 2).The base condition for q in these simulations was 0.35, 0.035, and 0.0001 for electrofishing and 0.035, 0.0035, and 0.0005 for mini-fyke nets (Table S1 1 ).Note that sampling means for species with q = 0.0001 or 0.0005 will normally have CPUE = 0 with occasional incidences of detection (i.e., bycatch).To create variation in the sampling means for each species and river reach, we varied q for each species among river reaches using a defined coefficient of variation (CV).For each species, q at each river reach was multiplied by a positive real number drawn from a uniform distribution (ranuni function in SAS) with a mean of 1 and range defined by the specified CV.Mini-fyke net data are typically more variable than electrofishing data, so we used a CV around q of 25% for mini-fyke nets and 15% for electrofishing in baseline simulations.The simulations pro-  vided qualitatively similar patterns of biomass as those observed in fish monitoring data from the UMRS (Fig. S3 1 , Fig. 2).

Methods of combining data across gears
We compared four methods for combining CPUE across gears: (i) MGMS; (ii) relative abundance; (iii) SpeciesMax; and (iv) sCPUE.Relative abundance was calculated for each gear by dividing CPUE data for each species in an observation by the total CPUE of all species for that observation.SpeciesMax values were calculated by taking the CPUE for each species in an observation and dividing by the maximum CPUE of that species across all observations.For MGMS, relative abundance, and SpeciesMax, we combine data for the two gear types by summing the transformed data across gears.Values for sCPUE were calculated by summing the CPUE data for both gears without any additional standardization.
For these simulations, the abundance distributions among river reaches for the 18 species are known.We evaluated how well each method reflected the true abundance patterns by converting both the actual species distributions and the four transformed sampling distributions (MGMS, relative abundance, SpeciesMax, and sCPUE; Fig. 3) for the 18 species at each reach to Bray-Curtis similarity matrices (6 × 6) and then used the Mantel test to correlate the similarity matrix from the actual species distributions with matrices from each of the four transformed sampling distributions.Each run of the model consisted of 1000 iterations of generating known and sampled species distributions, known and sampled Bray-Curtis similarity matrices, and Mantel correlation, allowing for robust descriptions of the correlation coefficient mean and variation for each standardization.We assessed the rate at which each method failed to correlate well with true abundance by counting number of model runs where the correlation coefficient was below 0.5.

Sensitivity analyses
We conducted sensitivity analyses examining how data transformations, the coefficient of variation around catchability (see Table 1), and the differences in catchability among species (see Table 2) affected the Mantel correlations with the actual species distributions.For many multivariate analyses, data transformations such as square root or logarithmic transformations or standardizations such as relative abundance or SpeciesMax are used to either achieve multivariate normality or reduce the influence of dominant species on results.Therefore, we compared model runs with untransformed CPUE data with runs where CPUE were logtransformed (log 10 CPUE + 1) or square-root-transformed prior to the multiple gear mean transformation.To examine sensitivity to the coefficient of variation around catchability, we used three different combinations of CV levels: 5% and 10%; 15% and 20%; and 35% and 45% for electrofishing and mini-fyke nets, respectively.To examine variation in catchability, we performed four analyses: (i) increasing the q values of electrofishing to 0.75, 0.075, and 0.0001, producing an even greater disparity in catchability between the two gears; (ii) using equal catchability levels (q = 0.1) for all species in both gears; and (iii) using random values for q (range 0-1) for all species and gears.

Results
Across a variety of settings, MGMS generally produced the highest mean correlation coefficients and the lowest variation.We evaluated the sensitivity of different methods to changes in the catchability coefficient (Table 1), and MGMS had the highest mean correlation coefficient across different settings.We also evaluated the sensitivity of different methods to increasing the coefficient of variation of the catchability coefficient (Table 2), and MGMS again produced the highest mean correlation coefficient.SpeciesMax and sCPUE produced mean correlation coefficients that were high but slightly below that of MGMS in most instances, while relative abundance performed the worst across all settings.Transformations had little impact on the relative performance of each method; MGMS produced the highest mean correlation coefficient and lowest standard deviation in untransformed, square-root-transformed, and log-transformed data (Fig. S4 1 ; Table S2 1 ).
Both MGMS and SpeciesMax produced few model runs with correlation coefficients less than 0.5, regardless of the scenario (Tables 1 and 2).More runs with low correlations occurred with sCPUE, and relative abundance produced more poor correlations than any other method by an order of magnitude.While the use of 0.5 as a cutoff of poor correlation is somewhat arbitrary, the distribution of 1000 iterations shows MGMS generally had the fewest number of low-performing iterations (Fig. S4 1 ).

Discussion
Of the methods we evaluated, MGMS provided the best correlations with the actual community structure patterns across a variety of scenarios and data transformations, suggesting it offers an effective method of combining measures of abundance from different sampling gears.Two additional methods, sCPUE and Spe-ciesMax, also perform relatively well, while the lower correlations obtained using relative abundance suggests it is a less desirable transformation for combining data from different gears for analysis of community structure.Moreover, MGMS, sCPUE, and Spe-ciesMax were able to maintain relatively high accuracy even when levels of catchability across species were highly variable (e.g., random catchability and high CV values), suggesting they should be robust to scenarios encountered in many situations.Thus, with the exception of relative abundance, the other methods of combining CPUE data also showed potential for use in community structure analyses.However, although sCPUE performed better than expected, it was more vulnerable to increasing coefficient of variation in the catchability coefficient.In many contexts, the coefficient of variation in catchability may be high and (or) unknown.Given that MGMS and SpeciesMax appear to be less vulnerable to increasing variation, these methods are probably better suited than sCPUE for analysis of community structure.
Although Jackson and Harvey (1997) warned of potential problems with differing patterns of species covariation among gears, the approach may still be desirable, especially when different gears are used to sample different species, size classes, or habitats within a given ecosystem (i.e., where uniform patterns of species covariation are not expected).In the latter case, researchers could weight the standardized CPUE data according to the proportion of the ecosystem composed of each habitat.For example, if electrofishing and hoop nets are used to sample fishes in different habitats and the total hectares of available habitat sampled by electrofishing is 25% of the habitat available for hoop netting, then we could multiply electrofishing CPUE by 0.25 before combining the two data sets.An alternative recommended by Weaver et al. (1993) is to build a data set composed of combinations of gear type, species, and life stage (e.g., there are unique rows for each combination of species, gear type, and life stage).In this case, our standardization technique is still desirable for preventing gears with the largest values of CPUE from dominating the results.
The application of MGMS to achieve representative fish community structure patterns from data collected by multiple gears should be immediately useful to a growing number of research fields.Increasingly, management of aquatic ecosystems is shifting away from single-species management toward a more ecosystembased approach that considers how entire assemblages are affected by processes.For instance, homogenization of fish faunas can occur when habitat change or degradation occurs (e.g., Gido No standardization technique is a panacea for all the problems that must be addressed to analyze combined data from multiple gears.Issues of selecting which species to include from each gear type and how to properly weight data from each gear need careful consideration (Jackson and Harvey 1997).For example, MGMS does not alter the variance structure of the data (e.g., the coefficient of variation across samples for the raw and standardized data will be equal because the data are simply divided by a constant).If there are substantial differences in variance among gear, it would be advisable to apply standard transformation techniques (e.g., square root, logarithmic) before standardization with MGMS, especially where multivariate normal data are an assumption of the analysis techniques.Note that different transformation techniques could be conducted for different gear prior to standardization with MGMS.Alternatively, MGMS standardized data from multiple gear could be combined using inverse-variance weighting to adjust for differences in variance among gear.Once data from multiple gears are standardized and combined, researchers still need to decide whether to apply further transformations (e.g., square root, fourth-root, or adjusting species to equal maximum) to dampen the influence of dominant species.Thus, the standardization technique we suggest here is just a first step in properly analyzing community data from multiple sampling gears.Although we have discussed this technique in the context of combining multiple fish sampling methods, it would be equally applicable to any community of organisms sampled with multiple methods.
Our study is an attempt to introduce a new standardization technique and present an evaluation in the context of analysis of community structure using similarity-or dissimilarity-based multivariate techniques.Based on this study, we feel that MGMS is well suited for analysis of community structure because it preserves both information on the abundance patterns of species across observations and the relative abundance of patterns across species within each observation.The efficacy of using MGMS for combining data from multiple sampling gears for other contexts and analyses, such as stock assessment of individual species for management of marine fisheries (sensu Conn 2010), will require additional assessment.Even in the context of the analysis of community structure, scientists should have specific goals and objectives for combining data from multiple gears.For example, if the primary objective for combining information is to increase the number of species available for analysis or to include multiple life stages of species in the analysis, MGMS should be a good option.If the primary objective is to simply combine data from multiple gears where the trends for each species are assumed to be the same for each gear, consideration should be given to the techniques of standardization of CPUE data for stock assessment (sensu Maunder and Punt 2004;Conn 2010).Finally, researchers should carefully consider the assumptions of the analysis technique and be sure MGMS is an appropriate standardization technique for that analysis.Note: All iterations of the model were run with a coefficient of variation of 40% for q.

Fig. 1 .
Fig. 1.Two examples of abundance distributions generated by the model for six species across 20 river reaches.

Fig. 2 .
Fig. 2.An example of (A) an abundance distribution among the six sample reaches generated by the model, (B) electrofishing mass per unit effort (MPUE) data generated by the model, and (C) mini-fyke MPUE data generated by the model.

Fig. 3 .
Fig. 3. Examples of distribution patterns for (A) the actual biomass of each species generated by the model and distributions formed by combining model electrofishing and mini-fyke netting data using (B) multigear mean standardization (MGMS), (C) relative abundance, (D) SpeciesMax, and (E) standardized catch per unit effort (CPUE).

Table 1 .
(Davies and Jackson 2006)ent (r ± 1 SD) from sensitivity analyses for coefficient of variation (CV) of the catchability coefficient (q).al.2009).Similarly, changes in assemblage-level data can form the basis of large-scale water quality assessments that may shape policy decisions(Davies and Jackson 2006).Assessments of community structure, typically through multivariate analysis such as ANOSIM, are ultimately dependent on robust estimates of abundance patterns, which ultimately require sampling with multiple gears.The strong performance of MGMS suggests it is effective for combining data from multiple gear types for use in multivariate analyses. et

Table 2 .
Mean correlation coefficient (r ± 1 SD) from sensitivity analyses for the catchability coefficient (q).