The Search Dilemma – Part 5

This really took some time, but it’s here! My paper “Profile-guided frequency scaling for Latency-Critical Search Workloads” was accepted and presented at CCGrid’21, pending only indexing at ACM/IEEE digital libraries. Thus, I am leaving the Preprint version here as I am not sure of the policies in regard of sharing the indexed paper.

A brief explanation of the paper is below.

Continue reading

Plugins for Oasis montaj

This is probably a very specific piece of work, but I consider that my current geophysical job consists on 70% pressing buttons and 30% actual thinking. That’s why I made a plugin for Seequent’s Oasis montaj which generates and exports all maps I need with one click.

It’s mainly for magnetic and radiometric data, and there’s a lot of compromise (using Bigrid as sole choice for gridding, for example). Nonetheless, it’ll save me hours whenever someone from the company requests me to grid all the geophysical stuff from any area.

More info at my GitHub.

P.S.: I’ll eventually update a lot of things in this blog.

Ternary Decomposer – C++ Update

I just rewrote the ternary decomposer in C++. The speed gains are really high (as Python is awfully slow in its loops), and I took 3 days to figure out that ESRI’s ArcGIS appears to have some issue with Float32 tiffs and I had to use unsigned int16 instead. I’ll update it someday to use GDAL’s Polygonize as the python version does.

A compiled binary release is located at my Github.

The Search Dilemma – Part 4

Note: This part produced a few hundreds of gigabytes of data from experiments. I’ll be showing only the important points. Also, the next part is probably the last one

We spoke about Elasticsearch in the first three posts of this series. However, due to the fact that a big overhead was introduced when instrumenting the Java Virtual Machine – and plus the overhead caused by the migrations between BIG and LITTLE cores -, we opted to change the benchmarking engine and the processor type.

Hence, we started working with a Xapian modified by Tailbench[1], a benchmarking suite. Xapian is the actual search engine core written in C++, while Tailbench provides a simplified API for configuring both the server and the client (also generating requests). Finally, we moved from the big.LITTLE/AMP processors to DVFS ones.

Continue reading

Python/Julia Packages for Scientific Computing in Geosciences

The best thing that has ever happened to Python is NumPy. Python is an easy language to learn overall and NumPy – written in good old C – allows everything to run in a seamless way, and this helped a lot to make Python rise as a scientific language for fields like image processing, machine learning and remote sensing.

On the other hand, Julia is a newer language which was in almost-perpetual beta until one or two years ago. Unlike Python, its entire standard library is written in Julia. This allows Julia to run even faster than NumPy but without the worries of excessive code optimizations (e.g. operation broadcasting/vectoring or even if your NumPy is using MKL or OpenBLAS). Most benchmarks compare Julia against a heavily optimized+compiled Python code (via Nuitka or Cython) and, well, if you’re going so far to write good code, it’s better to change engines.

The issues of Julia are that it is too new (so many packages are not matured enough or are barely maintained, unlike Python) and part of the design is a new concept for those used to traditional object-oriented programming – in particular, their implementation of the Multiple Dispatch methods is awesome. Finally, Julia has a slow startup time for compiling the actual code before running – and that might be a hassle for some types of applications.

But this post is not about the comparison of Julia and Python. Rather, it’s an actual rant of how I saw nearly three or four “geo”programming courses last week (mostly publicized on LinkedIn and Facebook) and none of them included actual important packages for geoscientists in its syllabus.

I’ll list my tackle on this.

Continue reading

From .NET Framework to .NET 5

Note: .NET 5 is still in preview version and does not officially supports Visual Basic.Net Forms as of April 23, 2020.

It has been over a decade since I wrote my last code in Visual Basic 6. A bunch of friends wanted to revive an old game engine, written in Visual Basic .NET, from a community I was part of and I accepted it – hence I am currently toying with Elysium on my free time.

VB6 stopped being supported by Microsoft a long time ago. While Elysium by itself deserves a separate post, I wanted to make the server run in Linux, so the team could create a default map together. Thus, I went to convert my project from .NET Framework 4.5 to .NET 5 (which also implies in converting to the new “Core” virtual machine).

Continue reading

On ‘Indexing Wikipedia at Elasticsearch’

After nearly 1 year working with Tailbench (and Xapian), I had to turn back and go to Elasticsearch again. This happened because Tailbench/Xapian has a lot of limitations that had to undergo a heavy code rewrite in order to bypass them – and, well, I was actually lacking time to do it.

While I won’t discuss again the issues I had with the Java Virtual Machine before (but later on, the ideas we had to circumvent all the bottlenecks), this is essentially a post that might be found by someone who had the same issues as me.

Continue reading

Pattern Recoignance for Images through Neural Nets

Note: The next two or three posts will have its code written in Python. After them, I’ll be moving exclusively to Julia due to performance issues.

I was recently asked to evaluate if there’s any kind of relationship between the geophysical signatures of magnetic+radiometric data and iron mineralizations. The datasets for the study area is about 10 gigabytes, with nearly 40 million lines – which is too much for both my notebook and for the personal server which runs this website’s nginx.

But the main point here is that there`s not many iron-mineralized points to build a decent model. Actually, there were only 10 confirmed points for a really large area. Well, I decided to try anyway, but instead of using the actual measurement values, my solution was resorting to RGB pixels.

Continue reading

On Reproducibility of Papers: Magnetic Alignments

Editor’s note: there’s an update to this post here.

From times to times, I like to reproduce papers that will ease my life at the Geological Survey of Brazil. However, I usually stumble exactly in reproducibility issues – lack of the implementation code, parameters or even the shown dataset – and that’s exactly what happened when I tried to reproduce the paper “Towards the automated analysis of regional aeromagnetic data“, from Holden et al.

Continue reading