Note: The next two or three posts will have its code written in Python. After them, I’ll be moving exclusively to Julia due to performance issues.
I was recently asked to evaluate if there’s any kind of relationship between the geophysical signatures of magnetic+radiometric data and iron mineralizations. The datasets for the study area is about 10 gigabytes, with nearly 40 million lines – which is too much for both my notebook and for the personal server which runs this website’s nginx.
But the main point here is that there`s not many iron-mineralized points to build a decent model. Actually, there were only 10 confirmed points for a really large area. Well, I decided to try anyway, but instead of using the actual measurement values, my solution was resorting to RGB pixels.
I had two types of data: the magnetic and radiometric data. The former is more related with deeper structures than the latter (which has a depth of few meters, or even centimeter). From the magnetic data, I decided to generate a mesh with the total gradient/analytic signal. For the radiometric data, I generated 6 meshs: the amount of potassium, thorium and uranium and the ratios betweeen them. However, those meshes are in 256-bit RGB, which would give me a very, very, large spectrum of colors to work with. I simplified them by transforming it in a 5-scale color, ranging from “Very Low” to “Very High”.
The second step was all about getting RGB values from the training dataset we had. Both iron and non-iron points had it’s coordinates in geographic values, so I simply converted them to UTM (because my GeoTiff images were in UTM) and then converted those UTM values to the pixel positions. Each RGB channel had its own column in a pandas dataframe, totalizing about 21 columns and ~200 lines on the dataset. Adding the labels was pretty much a nobrainer.
Anyway, since I would be predicting the probability of each pixel having iron or not, I scanned all pixels from the seven images as well in order to build the prediction dataset. It took nearly six hours to process this task, since interpreted python code is pretty much slow (and I couldn’t figure how to write what I want in numpy code). Over 400.000 lines with 21 features each, a pretty good dataset to toy with.
After this, a more direct approach: normalizing all data, building a 9x9x1 neural network with a hyperbolic tangent at the output layer, training data with the initial dataset… and predicting against each line of the total dataset. I didn’t ever bother with a testing dataset for validating because the training dataset was already too small.
The prediction analysis was pretty much done in a geological context. There were over 1000 points with prediction score over 0.9 (in a scale from -1 to 1). Those points appeared in a random pattern over all geological features, some of which already known to not contain iron. This takes me to the fact that the provided dataset wasn’t enough to establish a good model between the geophysical signatures and the mineralizations (as I expected), or at least that the provided image inputs aren’t really correlated to the problem.
Note: The Jupyter notebook, as well as the inputs and outputs, are located at Github.