WFNE: Waveforms with No Event information

This file describes the WFNE problem and actions taken to correct it.

Problem Description:
Some time after 2006, the NCEDC discovered a problem with event and waveform
information that had been transferred from the USGS CUSP system in Menlo
Park. Waveforms for some events had been added to the NCEDC Waveform and
related tables. In addition, rows in the AssocWaE table related those
waveforms to Event IDs. But there was no corresponding event information in
the NCEDC Parametric tables (Event, Origin, etc.) There were 87,516 sets of
event waveforms in this condition. The waveform times spanned the entire time
of CUSP Waveform collection: 1984/03/01 to 2006/11/29. After this latter time,
NCSS event waveforms were collected by a modified AQMS system instead of
CUSP.

This document will use "WFNE" to refer to a set of waveforms associated with a
single event ID whose event data was missing from the NCEDC databases.

There has not been a clear explanation of how this problem occurred.

Earthworm Processing:
Starting in 2018, I set up a system to pass all of these waveforms through an
Earthworm system to attempt to identify and locate origins within the
waveforms. The Earthworm modules used for this effort were tankplayer,
pick_ew, pick_fb, pkfilter, and binder_ew. The Earthworm "sausage" modules of
eqassemble, eq_coda and hyp2000_mgr were used to generate hyp2000 archive
files for each event. Three "sausage" chains were used at once: one for picks
from pick_ew; one for picks from pick_fp; and one from the combined pick
streams from both pickers filtered by pkfilter. It was found that this
combined pick stream generated the most complete set of event
solutions. Earthworm's ew2file was used to save the hyp2000 origins to files
for each of the three sausages. To prepare the waveforms, custom scripts ran
each waveform through Earthworm utilities ms2tb and remux_tbuf, and fed each
WFNE's tank file to tankplayer. Running tankplayer at "real" speed (i.e. not
accelerated), it took about 4 months to process all WFNEs.

I made no attempt to compute coda durations of Md magnitudes during this
processing. It would be quite difficult to configure Earthworm coda parameters
to cover the 22 years of these waveforms.

After the Earthworm processing was completed,  a script was used to create a
list mapping WFNE IDs to any binder IDs generated for the time period covered
by the WFNE.

WFNE Dispatching:
In 2020, I developed a C++ "dispatcher" program to display WFNE waveforms and
any event information that might be related to the WFNE. The event information
came from two places: any binder solutions produced previously; and events in
the NCEDC database with other event IDs occurring in the same time as the
WFNE. The time period used for this database query was 60 seconds before the
earliest start time of the WFNE waveforms, and the latest end time of the
waveforms. This program displayed a table of these events, and displayed any
phase picks associated with the event over the appropriate waveform. When the
user selected an event from the table, the waveforms were sorted in order of
increasing distance from the event origin; and phase cues for the selected
event were displayed with the waveforms. In this way it was quite easy for the
user to see which event and its phase picks were matched to the WFNE
waveforms.

Then the user designed how to dispatch the WFNE. Dispatch choices were:
- Delete the WFNE: added Event and Origin rows for the event with
  Event.selectflag = 0, Origin.datetime = median of WFNE waveform start
  times. 

- Mark as duplicate of an existing Database event and delete. This made the
  same database insertions as for "Delete", plus it added a remark (associated
  with the Event row) indicating that the WFNE event was a duplicate of the
  existing event.

- Mark WFNE as having damaged waveforms. This made the same database
  insertions as for "Delete", plus it added the WFNE ID to a file listing
  damaged event waveforms. See "Damaged Waveforms", later in this document.

- Save the selected binder solution to the database with this WFNE's ID. This
  inserted Event, Origin, Origin_Error, AssocAro and Arrival rows into the
  database.

- Save the selected binder solution with a new event ID. As above, this
  inserted Event, Origin, Origin_Error, AssocAro and Arrival rows into the
  database. In addition, rows were added to the AssocWaE table to associate
  the new event ID with the WFNE waveforms. (This is the same as what Jiggle
  does when the use "clones" an event. This case is used when the WFNE ID has
  already been used for some other case.

- Save the WFNE as a trigger event, since the waveforms show arrivals likely
  due to an earthquake not already in the database but for which no binder
  solution was formed. This action added Event and Origin rows for the event
  with Event.selectflag = 1, Origin.datetime = median of WFNE waveform start
  times. The user was prompted for the SNCL that was likely to be the closest
  to the event origin (i.e. SNCL of waveform with likely arrival.) This
  information was added to a file to assist Jiggle users in identifying the
  event to be timed.

- Save the WFNE as a probable teleseism. This action added Event and Origin
  rows for the event with Event.selectflag = 1, Origin.datetime = median of
  WFNE waveform start times, and Origin.gtype = 't' (for teleseism).

I used the WFNE dispatcher program to process all of the WFNEs. This took 128
days, finishing in February 2021. I used the following guidelines in deciding
how to dispatch each WFNE.
- If there was a binder solution for an earthquake not already in the
  database, and the solution adequately described the earthquake, I saved the
  solution. I tried to avoid saving solutions with obvious bad picks or poor
  RMS residuals. If the binder solution was poor, I saved the WFNE as a
  trigger event, noting the likely earliest arrival.

- If the WFNE waveforms appeared to be for a teleseism, I dispatched as a
  teleseism. There were a number of WFNEs with a duration of a little more
  than 1800 seconds, made up of waveforms decimated to lower than normal
  sample rate. There WFNEs were assumed to be for teleseisms even if the
  normal teleseism characteristics were not obvious.

- If the WFNE waveforms showed arrivals that appeared to be for a locatable
  earthquake, and that earthquake was clearly NOT in Southern California, the
  WFNE was saved as a trigger. WFNEs for Southern California events were
  generally deleted, as requested by David Oppenheimer.

- In some cases there were multiple events that needed to be saved. For
  multiple trigger events (no binder solutions), I saved a single trigger, but
  listed the likely first arrival SNCLs for each event. If there were binder
  solutions for some events but not others, I saved one trigger event with one
  or more first arrivals listed, plus saving one or more binder solutions with
  new event IDs.

- If the WFNE waveforms showed no arrivals (or less than four), or the
  waveforms showed arrivals for a Southern California event with no binder
  solution, I deleted the event.

- I avoided saving triggers for fore- or after-shocks for events within the
  same WFNE. However, if there were earthquakes from other locations
  (e.g. Geysers vs Mammoth) I saved a trigger event.

- The WFNEs covered a large number of events probably from Western
  Nevada. Since there were few if any waveforms from stations east of these
  events, the binder locations often were too far west compared to the likely
  quake location, often moving the solution into the NC region. I saved nearly
  all of these events so times can set appropriate locations.

- Occasionally I found archived event solutions that were obviously in
  error. This was particularly obvious when a binder solution was much better
  than the archived solution. I noted these problems in comments files.

Damaged Waveforms:
During review of Waveforms Without Events (WFNE), I found some event waveform
collections that appear to have been damaged some time after they were
acquired, either within CUSP, or during the conversion from CUSP mem/grm to
miniSEED. This damage is probably similar to what was found several years ago
in event waveforms at the NCEDC. 

There are basically two versions of the damage: 788 events in 1997 - 1999; and 4
events in 2000 and 2001

Group A 1997 - 1999: 
- A single transition (all at same time within on event) where DC offset,
  noise level and signal characteristics change within most waveforms on an
  event. This transition appears near the start, in the middle, or near the
  end of the waveforms.

- Sometimes the transition is a simple step. At other times, there is an
  intervening spike with one or more peaks.

- The most obvious problem is that the IRIG-E time signal appears in normally
  seismic SNCLs. A cursory check does not show this happening for the WWVB
  (IRIG-H) channel. Note that jiggle is normally configured (via DB table) not
  to show time code SNCLs. My WFNE event viewer shows waveforms for all SNCLs.

- Decoding the IRIG-E time signal on several events shows it to be offset from
  the nominal miniSEED time, perhaps by the time interval from start of trace
  to transition step.

- The times in the Waveform DB table appear to match with the miniSEED
  waveform times.

- Phase arrivals for archived events during the miniSEED time do not match
  with phase arrivals visible in waveforms.

- Picker/binder event solutions on these damaged waveforms generally have
  high RMS. Presumably this is because the waveforms are not tied to the
  correct SNCL, and hence sensor location of the waveforms are incorrect.

Group B 2000 - 2001:
- Several transitions in each waveform.

- Most obvious is that small blocks of IRIG-E time code appear in seismic
  SNCLs. These are too short to code. (One IRIG-E frame is 10 seconds long.)

It is not obvious to me how this damage could have occurred during conversion
form CUSP format to miniSEED. Although I have no knowledge of CUSP internals,
it seems likely to me that this damage happened with CUSP or while CUSP was
writing its output files.

My guess is that it would take a very large amount of work to correct these
damaged waveforms. It might be better to simply remove them from the normally
accessible part of the NCEDC, to avoid serving them to users.

Summary Results:
On the NCEDC web site, I have uploaded summaries of all the WFNE dispatch
actions. These are in http://www.ncedc.org/ftp/outgoing/lombard/WFNE/. For
each year 1984 - 2006, there are files named "add.log_YEAR" and
"delete.log_YEAR". These files list how each WFNE was dispatched. For binder
solutions listed in the add.log* files, I also provide a summary of the origin
(time, location, etc.) For applicable years, there are files listing my
comments, as well as damaged WFNE waveforms.

Finally, I made a spreadsheet with numbers for each year. These are summ.csv
(simple CSV file) and summ.odf.ods (OpenOffice format; I can't explain the
filename extensions!)


Additional Tasks To Consider:
- Compute Md magnitudes for the binder solutions saved for WFNEs.

- Act on the events listed in the comments files.

- Review events with binder solutions. As always, automatic picks are far from
  perfect. 

- Time the events saved as triggers. Given the large number of triggers
  already in the database waiting for review, I assume this step will not
  happen any time soon.

- Review (formerly) WFNE waveforms for fore- and after-shocks not already
  identified. This review must include "deleted" WFNE events. This should be
  part of a comprehensive review of NCEDC waveforms for unidentified
  events. This is a very large project.