2023 - My year in review
16 Jan 2024Looking back into learnings, changes and… culture.
Read more!The more I am working with data science, the more need for organized process I see. It can be enforced be conventions, documentation, code reviews, but as always it is automation that pays off the most. ‘Developer must be lazy’, as one of my teachers used to say. This is why, when I took part in another Kaggle competition, I decided to build it around more mature ideas than just experimenting in Jupyter Notebook.
Read more!I took part in Kaggle’s NLP competition called Disaster Tweets. The goal was to predict if given tweet is about real catastrophe or not.
Read more!If you ever had a situation when you were forced to stick with specific Java version due to dependant library version or (ekhem) licensing restriction, you can now breathe a sigh of relief: due to fantastic JVM community it is possible to avoid this issue! I will show you how to achieve this in simple, ready to use, project.
Read more!Did you know that, according to Java implementators, about 25% of memory consumed by large-scale applications are Strings? And what if I tell you that you can decrease this value with a single command?
Read more!Working as a search engineer myself I decided to develop a framework for finding optimal query weights for search engines like Elasticsearch or Solr. It is based on a machine learning branch called genetic programming, inspired by the process of natural selection. In this post I’ll describe it and briefly discuss how the good process of building the quality of search should look like. Let’s start!
Read more!Although Solr comes with standard tokenizer implementation, which is well prepared to tokenize most of the texts, there are cases when it is helpless. Imagine a document with many numbers, of which many are followed by percentage sign. In a certain contexts it is expected to distinguish queries that refer to those percentages & plain numbers. How to achieve that? We need a custom tokenizer.
Read more!