30 Mar 2017
Algorithms for recognizing entities from text are ones of the most crucial aspects of text analysis. They lead to better understanding of the content, enable additional operations like filtering or grouping and - most importantly - allow to process data automatically. In the previous post I announced combination of text indexing & such extraction and in order to keep my promise I created a fork of Solr Text Tagger.
Read more!
19 Feb 2017
The process of indexing in Solr in an advanced topic covered by many publications. On the most basic level it can be described as putting data into previously prepared containers. But what if user wants to perform additional data processing depending on documents that already are in the index?
Read more!
31 Oct 2016
Recently, going through the Spring MVC documentation, I found a feature I haven’t previously used - asynchronous request processing. It is an addition of Servlet 3 API and a part of Java EE since its sixth edition from 2009; Spring started support it three years later. As it looks interesting (and as async is a popular word in developer’s journey since at least early Web 2.0 days) I decided to go deeper into details of it.
Read more!
28 Jun 2016
Java is well known for its necessity to write quite a lot of code to perform simple tasks: all this getter/setter methods handled nicely by the competitors, common Problem Factories, Calendar & Date or logging jumbo. As more languages with plain syntax arise, staying put with actual aproach seems to be a bit out-of-date. There are even some propositons to add JavaScript’s-like val folding to change things, but with Oracle’s lacking investment it is hard to believe that any changes appear in a finite time. On the other hand Java ecosystem is full of decent libraries that can fill this gap; one of this libraries is Project Lombok.
Read more!
21 May 2016
Server side request forgery occurs when attacker enters one application and is able to use to it to perform some activity on another application(s). It can be scaning internal network, calling services or making request to another website - our case. Note, that a hacked application would be responsible for an attack - as it produces a call! - not hacker’s machine. More information can be found on numerous sites on the Internet, e.g. here.
Read more!
10 Apr 2016
The ideal situation is when whole index can be located in memory, due to disk operations are much slower then those in RAM. What’s more, often companies have to fit the requirenments of the tender or reduce server costs, which put pressure on developers to come up with a solution that will make the index smaller.
Read more!
13 Mar 2016
Recently I had a need to measure Solr memory usage and I decided to use free Oracle tool - VisualVM. As usual an official documentation provides some help, but to make things simpler I extended my Solr startup script for Windows to put all important information in one place.
Read more!