When we send data to the FFC, it needs to meet a specific set of requirements. The code in this file aims to make sure input data, whether gaged or a manual timeseries, follows those requirements. Namely:

filter_timeseries(
  timeseries,
  date_field,
  flow_field,
  date_format_string,
  max_missing_days = 7,
  max_consecutive_missing_days = 1,
  fill_gaps = "no"
)

Arguments

timeseries.

A timeseries data frame with date and flow fields

date_field.

character vector/string with the name of the field that holds the dates

flow_field.

character vector/string with the name of the field that holds flow values

date_format_string.

A format string to use when parsing dates into POSIX time

max_missing_days.

How many days can be missing from each water year before it is considered too incomplete and will be dropped? The water year will need to have *more* missing days than this value (so, the default of 7 means that a water year with 7 missing values will be kept, but a water year with 8 missing values will be dropped).

max_consecutive_missing_days.

How many days in a row can be missing before a water year is dropped? This is evaluated independently from max_missing_days, so a single failure of the rule attached to this parameter causes the water year to be dropped. The previous parameter handles the total number of missing values across a water year, while this parameter only looks at each gap's length. A max_consecutive_missing_days value of 1 means each gap can only be a single day in length (so, if we had values for day 1 and 2, were missing day 3, and then had values for day 4 and 5, that'd be acceptable. If we were also missing day 2 or 4, that would cause the water year to be dropped).

fill_gaps.

Currently unimplemented, but would allow for filling remaining gaps after water the rules are evaluated according to the other parameters. Defaults to "no", which turns it off. In the future, we expect to add two other options here: "linear" to linearly inhterpolate across gaps and "previous" to fill gaps with the closest valid value before the gap.

Details

1. We should only send complete water years - partial water years at the beginning or end have an outsize influence on the calculations. This rule will actually be handled just by adhering to the next two rules.

2. Along the same lines, we shouldn't allow any large gaps. When we find a gap of *2 or more days*, then we should drop the entire water year.

3. We should only allow 7 total missing days, which we will fill either with values from the previous day, or a linear interpolation (maybe with a flag?) before sending to the FFC.

This function accepts a timeseries data frame, assesses/filters it according to these rules, then returns a new timeseries to the caller. A record is considered a gap if it is missing for a date, or if it is present, but has a flow value of NA. The timeseries should already be *daily* data.

This function is also staged to fill any remaining gaps in preparation for the online FFC disabling that functionality, but the code has not yet been enabled here.