It was a simple bug in the XY Cut layout recognition code that made it too eager to see columns everywhere.
Also removed the dependence of the layout analysis algorithms on the display DPI (introduced by the recently added feature of using KScreen) to make their behavior more predictable and reproducible.
BUGS: 326207
BUGS: 331090
FIXED-IN: 4.13.0
REVIEW: 115759
Having functions which are defined but not used serves no gain. This patch
therefore removes the extra method and updates the comment reference in the
second one to make it standalone.
REVIEW: 114959
Also simplified code a bit by removing unnecessary calls to toLower in TextPagePrivate::findTextInternalForward and TextPagePrivate::findTextInternalBackward I also fixed a small bug: the letter capital I with dot above (U+0130) did not match itself in case-insensitive mode on Qt 4.8.4 (U+0130 still does not match lowercase i (U+0069), which can be considered another bug, that I didn't fix (although this behavior conforms to the Unicode case folding rules)).
(I did not implement the Knuth-Morris-Pratt algorithm that I promised in a comment of Bug 323263 because on second thought I find that the win, if any, would probably be negligible except for some very special documents and special query strings.)
BUGS: 323262
BUGS: 323263
REVIEW: 112135
When searching backwards end is not actually words.end but words.begin (since the loop goes backwards) hence we can't pass end to stringLengthAdaptedWithHyphen
I've now renamed end to loop_end to make it a bit more clear.
BUGS: 309030
FIXED-IN: 4.9.4
This is needed because sometimes the hash had collissions and make it impossible to know which character we had to put back
Now we just keep the word and the characters together in the same class and it is much easier to correlate them
Also the code gets much more simplified and less new/delete are needed
This fixes the crash in 287138
I am still concerted that we use the page width and height in TextPagePrivate::correctTextOrder, but that should not cause crashes, at most some misplacements of very small text
TextPagePrivate::correctTextOrder was running makeAndSortLines + calculateStatisticalInformation to calculate the tcx, tcy
to pass it to XYCutForBoundingBoxes and then in XYCutForBoundingBoxes we were doing the same but just for when countLoop was not 0, thus
if we remove the first code and remove the check for countLoop being not 0 we end up with the same behaviour
* Remove m_word_chars_map, m_XY_cut_tree, m_lines and m_line_rects: They are not really member
variables and where used only to pass info from one method to another, we use variables and parameters
for that now
* Make output parameters be * instead of & following Qt guidelines
* Fix all memory leaks, valgrind is happy now :-)