28 Fishery Data and Observations

The operating model generates fishery data for each simulation replicate across both the historical and projection periods. Management procedures receive these data and use them to set management advice. Two sources of data are supported:

Real observed data (OM@Data): supplied by the user for historical time steps. Real data are never overwritten by simulated values.
Simulated data: generated by applying observation error to true population quantities. Simulated values are only produced for data types where an observation model (OM@Obs) has been configured and no real data exists for that time step.

The same logic applies during the projection period: data are only simulated for time steps not already covered by real data. If observed data in OM@Data extends beyond the historical period (e.g. when conditioning on a dataset that runs past OM@nYear), those values are preserved and not overwritten by the simulation.

The result for the historical period is stored in Hist@Data and accessed via Data(Hist). During the projection period, data are stored in MSE@PPD and accessed via PPD(MSE). See Section 28.4 for details.

28.1 Observation Object

Observation error is configured through the Obs class, accessed as OM@Obs[[complex]][[fleet]]. Each element is an obs object with one sub-object per data type:

Slot	Class	Data type
`Effort`	`effortobs`	Fishing effort
`Landings`	`catchobs`	Landed catch
`Discards`	`catchobs`	Discarded catch
`CPUE`	`indicesobs`	Fishery-dependent abundance indices
`Survey`	`indicesobs`	Fishery-independent survey indices
`LandingsAtAge`	`compobs`	Age composition of landings
`DiscardsAtAge`	`compobs`	Age composition of discards
`LandingsAtSize`	`compobs`	Size composition of landings
`DiscardsAtSize`	`compobs`	Size composition of discards

For each data type, data are generated in one of two ways: either the user supplies real observed data in OM@Data (in which case observation error parameters in OM@Obs are conditioned on those data automatically), or the user configures OM@Obs directly to simulate that data type. If neither real data nor a configured Obs sub-object exists for a given fleet and data type, no data are generated for that combination.

28.2 Real Data Conditioning

When real observed data are provided in OM@Data, the observation error parameters in OM@Obs are conditioned on those data before any simulated data are generated. This is done by ConditionObs(), which fits observation error parameters fleet by fleet for each data type.

28.2.1 Catch and Effort

For landings, discards, and effort, the conditioning fits a per-simulation multiplicative bias and a lognormal error standard deviation by comparing OM-simulated values to the observed time series. The years used for conditioning can be specified per data type via the Years slot of the relevant sub-object (e.g. CatchObs@Years); by default all historical years are used. The conditioning period should be chosen to be representative of the observation error expected during the projection period, for example, restricting to more recent years if early data were collected under different survey designs or reporting standards.

Let $\hat{y}_{s,t}$ be the OM-simulated value (catch or effort) for simulation $s$ and year $t$, $y_t$ the observed value, and $T$ the number of conditioning years. The multiplicative bias for simulation $s$ is estimated as the mean ratio of simulated to observed values across conditioning years:

\[b_s = \frac{1}{T}\sum_{t=1}^{T} \frac{\hat{y}_{s,t}}{y_t}\]

The raw historical error multipliers are:

\[\varepsilon_{s,t} = \frac{y_t}{\hat{y}_{s,t} \cdot b_s}\]

The multipliers are standardised so that $\bar{\varepsilon}_{s,\cdot} = 1$, separating the bias component from the variance component. The lognormal standard deviation $\sigma_s$ is estimated from the variance of $\log \varepsilon_{s,t}$ across conditioning years. Projection-year errors are drawn as $\varepsilon_{s,t} \sim \text{LogNormal}(-\sigma_s^2/2,\, \sigma_s^2)$.

Conditioned parameters are stored in the relevant sub-object: Bias, CV, and Error (dimensions Sim × Year, spanning both historical and projection years).

28.2.2 Abundance Indices

For CPUE and survey indices, conditioning estimates catchability $q$, observation error standard deviation $\sigma$, and lag-1 autocorrelation $\rho$.

The nominal simulated index for simulation $s$ and year $t$ is the selectivity- and area-weighted abundance summed over ages and areas:

\[\hat{I}_{s,t} = \sum_k \sum_a N_{a,k,t} \cdot \bar{W}_{a,t} \cdot v_{a,t}\]

where $v_{a,t}$ is the index selectivity-at-age (user-specified or taken from fleet selectivity). The index can be in biomass, numbers, or recruitment units (IndexObs@Units). Catchability is estimated as the ratio of mean observed to mean simulated index over the non-NA conditioning years:

\[q_s = \frac{\bar{y}}{\bar{\hat{I}}_s}\]

Both observed and simulated series are standardised to mean 1 before computing log-residuals:

\[r_{s,t} = \log\!\left(\frac{y_t / \bar{y}}{\hat{I}_{s,t} / \bar{\hat{I}}_s}\right)\]

The observation error standard deviation and lag-1 autocorrelation are estimated from these residuals:

\[\sigma_s = \text{sd}(r_{s,t}), \qquad \rho_s = \text{cor}(r_{s,t},\, r_{s,t-1})\]

For projection years, residuals are generated as independent draws from a truncated normal distribution (truncated at $\pm \tau \sigma_s$ where $\tau$ is IndexObs@TruncSD, default 2), then propagated forward using an AR(1) recursion seeded from the final historical log-residual $r_{s,T}$:

\[\tilde{r}_{s,t} = \rho_s \, \tilde{r}_{s,t-1} + \eta_{s,t} \sqrt{1 - \rho_s^2}, \qquad \eta_{s,t} \sim \mathcal{N}(0,\, \sigma_s^2)\]

The error multipliers stored in IndexObs@Error are $\exp(r_{s,t})$ for historical years and $\exp(\tilde{r}_{s,t})$ for projection years.

Note

Hyperstability/hyperdepletion (the $\beta$ parameter, where $I \propto B^\beta$) is not currently estimated during conditioning; $\beta = 1$ is assumed.

28.2.3 Composition Data

For age and size compositions, conditioning estimates per-bin log-offsets (Shift) and effective sample sizes (ESS) by comparing observed proportions to predicted proportions from the true population.

Let $\hat{p}_{b,t}$ be the predicted proportion in bin $b$ at time $t$, and $o_{b,t}$ the observed proportion. The log-offset for each bin is:

\[\Delta_{b,t} = \log(o_{b,t}) - \log(\hat{p}_{b,t})\]

The effective sample size is estimated using the harmonic-mean method:

\[\widehat{\text{ESS}}_t = \frac{\sum_b \hat{p}_{b,t}(1 - \hat{p}_{b,t})}{\sum_b (o_{b,t} - \hat{p}_{b,t})^2}\]

capped at the observed sample size. These are stored in CompObs@Shift and CompObs@ESS.

28.3 Simulated Data Generation

After conditioning, GenerateHistoricalData() generates observed data for each simulation replicate across all historical years simultaneously. During the projection period, GenerateProjectionData() is called each time step and appends one year at a time to the growing Data object passed to the management procedure. The observation error model is the same in both cases.

28.3.1 Catch and Effort

Observed catch or effort for simulation $s$ and year $t$ is:

\[\tilde{y}_{s,t} = \hat{y}_{s,t} \cdot b_s \cdot \varepsilon_{s,t}\]

where $b_s$ is the per-simulation bias and $\varepsilon_{s,t}$ is the lognormal error multiplier from CatchObs@Error (or EffortObs@Error). Fleets without a configured EffortObs or CatchObs receive NA values for that data type.

28.3.2 Abundance Indices

The observed index for simulation $s$ and year $t$ is:

\[\tilde{I}_{s,t} = q_s \cdot \hat{I}_{s,t} \cdot \varepsilon_{s,t}\]

where $q_s$ is the conditioned catchability (IndexObs@Efficiency) and $\varepsilon_{s,t} = \exp(r_{s,t})$ is the error multiplier from IndexObs@Error. When conditioning on real data, historical error multipliers are derived directly from the log-residuals of the conditioning fit. For the projection years the observation error values are generated by the AR(1) process above. When indices are purely simulated (no real data), the AR(1) process is applied across the full historical and projection period.

28.3.3 Composition Data

Composition data are generated by Dirichlet-Multinomial sampling. The concentration parameters for bin $b$ are:

\[\alpha_b = \text{ESS} \cdot \Theta \cdot \hat{p}_b \cdot \exp(\Delta_b)\]

where $\hat{p}_b$ is the true catch proportion in bin $b$, $\Delta_b$ is the conditioned log-offset (CompObs@Shift; zero if no real data were used for conditioning), $\Theta$ is an overdispersion parameter (CompObs@Theta), and ESS is the effective sample size. Observed counts are then drawn as:

\[\mathbf{n} \sim \text{DirichletMultinomial}(N,\, \boldsymbol{\alpha})\]

where $N = $ CompObs@SampleSize. The observed composition is reported as proportions $o_b = n_b / N$.

28.4 Output Structure

28.4.1 Historical Period

Generated data are stored in Hist@Data, a list of length nSim (compressed to length 1 if all replicates are identical) and accessed via Data(Hist). Each element is a named list by stock complex, containing a data object with the following slots:

Slot	Dimensions
`Effort@Value`	`Year × Fleet`
`Landings@Value`	`Year × Fleet`
`Discards@Value`	`Year × Fleet`
`CPUE@Value`	`Year × Fleet`
`Survey@Value`	`Year × Fleet`
`LandingsAtAge@Value`	`Year × Fleet × Age`
`DiscardsAtAge@Value`	`Year × Fleet × Age`
`LandingsAtSize@Value`	`Year × Fleet × Class`
`DiscardsAtSize@Value`	`Year × Fleet × Class`

Each slot also carries a parallel @CV array of the same dimensions. Area is not retained in Hist@Data, the catch and indices are summed over areas before observation error is applied.

28.4.2 Projection Period

During the projection period, data are stored in MSE@PPD (posterior predictive data), a named list by management procedure, and accessed via PPD(MSE).

To avoid replicating the historical data across all MPs, MSE@PPD[[1]] (the first MP) stores the full time series spanning both historical and projection years. All subsequent MPs store projection years only. PPD(MSE) reconstructs the complete time series for every MP by prepending the historical data from MSE@PPD[[1]] to each projection-only object.

The same observation model and real-data-first logic apply during projection: simulated values are only generated for time steps not already covered by real data in OM@Data, and only for data types for which a configured Obs sub-object exists.