Anomaly Detection in Seismic Data

Using machine learning to find anomalies in high dimensional data.

Project Description:

This project’s purpose is to detect discords in Antarctic seismic data. There are lots of seismic readings taken from various devices around Antarctica, however the data itself comes in 8 channels of 64 x 313 matrices. That means that each snippet contains 160256 values. Therefore, it becomes necessary to reduce the dimensionality of the data in order to make successful comparisons between different snippets.

On the left is an example of a nonanomalous spectrogram. On the right is a spectrogram with anomalies.

There are many known methods of dimensionality reduction that retain important features. PCA and other non-deep-learning techniques work very well with limited data, but because I had 1000 examples of nonanomalous data, I decided to use a 3 dimensional autoencoder. An autoencoder would be able to do better feature extraction than non-deep-learning techniques.

A summary of the autoencoder model that was used.

After 200 epochs, the autoencoder was ready. All training data was downsampled using the autoencoder, and the mean and standard deviation of the training dataset was found. From there, test data was downsampled and checked against the training mean and std. If the test data was an outlier, then it would be marked as an anomaly. In the end, the autencoder was able to detect all 143 anomalies present in the test data!