Daniel Medeiros

Doctorand @ KTH
Data Engineer / DevOps


On Reproducibility of Papers: Magnetic Alignments

Editor’s note: there’s an update to this post here.

From times to times, I like to reproduce papers that will ease my life at the Geological Survey of Brazil. However, I usually stumble exactly in reproducibility issues - lack of the implementation code, parameters or even the shown dataset - and that’s exactly what happened when I tried to reproduce the paper “Towards the automated analysis of regional aeromagnetic data”, from Holden et al.

I have chosen this paper due to the constant need of selecting the magnetical structures from data. This paper uses the contrast in colors (black/white)to do this.

So, first we have our data. This dataset is known as “Magnetic Anomaly”, has a 52x52km area and is the measured magnetic intensity in the area minus the Earth geomagnetic field (IGRF). The area is located at the state of Bahia, Brazil.

import tifffile from skimage.io import imsave from skimage.io import imread from matplotlib import pyplot as plt from matplotlib.pyplot import savefig

data = tifffile.imread('iramaia_am.tif') plt.imsave('iramaia_am.jpg', data)

Through Skimage’s rgb2gray function, we transform the 3-channel RGB image into a single-channel grayscale.

from skimage.color import rgb2gray grayscale = rgb2gray(data) plt.imsave('grayscale.jpg', grayscale, cmap=plt.cm.gray)

From now on, there are two ways to proceed: calculating either the entropy of the image or the standard deviation. The former was suggested by the present paper and can be simply done with Skimage’s Entropy function, while the latter was used by a newer paper by the same author and envolves some operations with NumPy. I’ll stick with the Entropy one, but I’ll show both here.

Calculating the standard deviation for a 9x9 window is pretty much straightforward:
import numpy as np from skimage.filters import gaussian mult_1 = np.multiply(grayscale, grayscale) blur_1 = gaussian(mult_1, sigma=(9, 9), truncate=3.5, multichannel=False) blur_2 = gaussian(grayscale, sigma=(9, 9), truncate=3.5, multichannel=False) mult_2 = np.multiply(blur_2, blur_2) std_hat = np.sqrt(blur_1 - mult_2)

#Remove all values that are less than zero from numpy import isnan where_are_NaNs = isnan(std_hat) std_hat[where_are_NaNs] = 0 plt.imsave('std.jpg', std_hat, cmap=plt.cm.gray)

The result is pretty good, except that we get the border of the image as an alignment and it’s actually blurry. Calculating the Entropy solves a bit of those issues:

from skimage.morphology import square from skimage.filters.rank import entropy sigma_hat = entropy(grayscale, square(9)) plt.imsave('entropy.jpg', sigma_hat, cmap=plt.cm.gray)

The issue with Entropy is that it puts too many features in the resulting image. In particular, color contrasts on the top-left corner are also visible within the entropy image, although weakened. The next step is pretty much a solution for it: a filter created by Peter Kovesi (or at least I think so) known as Phase Symmetry. It was originally written for Matlab, but was recently converted to python and is located at the Phasepack package.

This is where the paper shows its problem: no parameters at all, and I’m essentially trying my luck here. By pure tests, I’ve discovered that both K=20 and minWaveLength=4 will reduce a lot of undesired structures, and polarity=1 shows what I want to see. However, I believe those parameters vary on a case-by-case basis (time for machine learning?!).

from phasepack import * psym = phasesym(sigma_hat, nscale=5, norient=6, minWaveLength=4, mult=2.1, sigmaOnf=0.55, k=20., polarity=1, noiseMethod=-1) plt.imsave('psym.jpg', psym, cmap=plt.cm.gray)

This is a pretty good result as it is more focused on the strongest contours from the entropy image. However, the same top-left features are still present (although weakened) and we don’t want them as they’re (probably) not magnetic structures. The paper presents the “thresholding” as solution for this issue.

As I am not a specialist in image processing, I took a good time researching this. “Thresholding” has a lot of algorithms, and the paper does not discuss any of this. Hopefully, scikit-image has some algorithms that fastens our processing. Let’s run all the thresholding algorithms.

from skimage.filters import try_all_threshold fig, ax = try_all_threshold(psym, figsize=(16, 12), verbose=False) fig.savefig('threshold.png')

In a nutshell, thresholding binarizes the images between 0 and 1. Most of the thresholding algorithms (Mean, Triangle and Yen) above puts an abnormal number of bright spots, which I associated to undesired features/noise. Arbitrarily, I decided to use Otsu.

Now, I don’t want use a single thresholding mode for the entire image. I decided to use the ‘local’ threshold version, which allows me to be more selective with features.

from skimage.morphology import square from skimage.filters import threshold_otsu, rank from skimage.util import img_as_ubyte

selem = square(9) local_otsu = rank.otsu(psym, selem) threshold_global_otsu = threshold_otsu(psym) local_otsu_mod = psym >= T local_otsu_mod = local_otsu_mod >= local_otsu plt.imsave('local_otsu.jpg', local_otsu_mod , cmap=plt.cm.gray)

A lot of noise was created, and the left part issues was aggravated - but this is the best I got so far since I don’t know which threshold the paper used. Anyway, using GDAL’s Polygonyze to transform this image into shape is pretty much a no-brainer, although I believe this methodology is not production-ready yet.

Note 1: I didn`t contact the author to ask reproducibility details. I came in contact to this paper because it’s the basis for a paid plugin on Seequent’s Oasis montaj for the exact same task. Since the plugin is actually developed by the University of Western Australia (where the paper came from), I assume the author is envolved and probably would not share the code. Nevertheless, the paper’s reviewers did not address reproducibility issues, and that is a major issue here.
Note 2: While most of the imports are usually put together, I’ve split them for this post.
Note 3: The actual dataset has over 40mb in image size and a pretty large resolution, thus the processing takes some time. I did resample every image for this post.