Ternary Decomposer – C++ Update

I just rewrote the ternary decomposer in C++. The speed gains are really high (as Python is awfully slow in its loops), and I took 3 days to figure out that ESRI’s ArcGIS appears to have some issue with Float32 tiffs and I had to use unsigned int16 instead. I’ll update it someday to use GDAL’s Polygonize as the python version does.

A compiled binary release is located at my Github.

The Search Dilemma – Part 4

Note: This part produced a few hundreds of gigabytes of data from experiments. I’ll be showing only the important points. Also, the next part is probably the last one

We spoke about Elasticsearch in the first three posts of this series. However, due to the fact that a big overhead was introduced when instrumenting the Java Virtual Machine – and plus the overhead caused by the migrations between BIG and LITTLE cores -, we opted to change the benchmarking engine and the processor type.

Hence, we started working with a Xapian modified by Tailbench[1], a benchmarking suite. Xapian is the actual search engine core written in C++, while Tailbench provides a simplified API for configuring both the server and the client (also generating requests). Finally, we moved from the big.LITTLE/AMP processors to DVFS ones.

Continue reading

Python/Julia Packages for Scientific Computing in Geosciences

The best thing that has ever happened to Python is NumPy. Python is an easy language to learn overall and NumPy – written in good old C – allows everything to run in a seamless way, and this helped a lot to make Python rise as a scientific language for fields like image processing, machine learning and remote sensing.

On the other hand, Julia is a newer language which was in almost-perpetual beta until one or two years ago. Unlike Python, its entire standard library is written in Julia. This allows Julia to run even faster than NumPy but without the worries of excessive code optimizations (e.g. operation broadcasting/vectoring or even if your NumPy is using MKL or OpenBLAS). Most benchmarks compare Julia against a heavily optimized+compiled Python code (via Nuitka or Cython) and, well, if you’re going so far to write good code, it’s better to change engines.

The issues of Julia are that it is too new (so many packages are not matured enough or are barely maintained, unlike Python) and part of the design is a new concept for those used to traditional object-oriented programming – in particular, their implementation of the Multiple Dispatch methods is awesome. Finally, Julia has a slow startup time for compiling the actual code before running – and that might be a hassle for some types of applications.

But this post is not about the comparison of Julia and Python. Rather, it’s an actual rant of how I saw nearly three or four “geo”programming courses last week (mostly publicized on LinkedIn and Facebook) and none of them included actual important packages for geoscientists in its syllabus.

I’ll list my tackle on this.

Continue reading

From .NET Framework to .NET 5

Note: .NET 5 is still in preview version and does not officially supports Visual Basic.Net Forms as of April 23, 2020.

It has been over a decade since I wrote my last code in Visual Basic 6. A bunch of friends wanted to revive an old game engine, written in Visual Basic .NET, from a community I was part of and I accepted it – hence I am currently toying with Elysium on my free time.

VB6 stopped being supported by Microsoft a long time ago. While Elysium by itself deserves a separate post, I wanted to make the server run in Linux, so the team could create a default map together. Thus, I went to convert my project from .NET Framework 4.5 to .NET 5 (which also implies in converting to the new “Core” virtual machine).

Continue reading

On ‘Indexing Wikipedia at Elasticsearch’

After nearly 1 year working with Tailbench (and Xapian), I had to turn back and go to Elasticsearch again. This happened because Tailbench/Xapian has a lot of limitations that had to undergo a heavy code rewrite in order to bypass them – and, well, I was actually lacking time to do it.

While I won’t discuss again the issues I had with the Java Virtual Machine before (but later on, the ideas we had to circumvent all the bottlenecks), this is essentially a post that might be found by someone who had the same issues as me.

Continue reading

Pattern Recoignance for Images through Neural Nets

Note: The next two or three posts will have its code written in Python. After them, I’ll be moving exclusively to Julia due to performance issues.

I was recently asked to evaluate if there’s any kind of relationship between the geophysical signatures of magnetic+radiometric data and iron mineralizations. The datasets for the study area is about 10 gigabytes, with nearly 40 million lines – which is too much for both my notebook and for the personal server which runs this website’s nginx.

But the main point here is that there`s not many iron-mineralized points to build a decent model. Actually, there were only 10 confirmed points for a really large area. Well, I decided to try anyway, but instead of using the actual measurement values, my solution was resorting to RGB pixels.

Continue reading

On Reproducibility of Papers: Magnetic Alignments

From times to times, I like to reproduce papers that will ease my life at the Geological Survey of Brazil. However, I usually stumble exactly in reproducibility issues – lack of the implementation code, parameters or even the shown dataset – and that’s exactly what happened when I tried to reproduce the paper “Towards the automated analysis of regional aeromagnetic data“, from Holden et al.

Continue reading

The Search Dilemma: Part 3

Note: The approach described by this topic was tested more than 1 year ago, and we (me and my master’s advisor) decided to drop it as the actual implementation found too many technical barriers (e.g. real-time profiling with low overhead). Mostly of what follows here are some insights I had later.

On the last post, we discussed a first approach to the hot function model: whenever a thread accessed that zone, it would be promoted to the faster cores and, when exiting, demoted to slower cores. The first result did not show any improvements on quality of service. Why?

Continue reading

Color Quantization for Ternary Maps with GDAL

One of the works the Geophysicist usually do at the Geological Survey of Brazil is the correlation between geophysics and geology for geological mapping. This task is done through the use of gammaspectrometric data – and, since its penetration (skin depth) is very low, the data is deeply correlated to the surface geology. By assigning colors to Potassium, Thorium and Uranium values, it is possible to generate what is known as ‘ternary map’ – pretty much a geological map. As this map is usually shown in RGB, the range of colors is about 16 million – and this is a issue for interpretation. In this post, we discuss a method to reduce it to 27 colors.

Continue reading