Tesseract is now decoupled from the tessdata language corpus.
This avoids recompilation when building Tesseract with a custom set
of languages.
Update k2pdfopt to use the new wrapper interface.
* pgadmin: use https homepage
* msn-pecan: move homepage to github
google code is now unavailable
* pidgin-latex: use https for homepage
* pidgin-opensteamworks: use github for homepage
google code is unavailable
* putty: use https for homepage
* ponylang: use https for homepage
* picolisp: use https for homepage
* phonon: use https for homepage
* pugixml: use https for homepage
* pioneer: use https for homepage
* packer: use https for homepage
* pokerth: usee https for homepage
* procps-ng: use https for homepage
* pycaml: use https for homepage
* proot: move homepage to .github.io
* pius: use https for homepage
* pdfread: use https for homepage
* postgresql: use https for homepage
* ponysay: move homepage to new site
* prometheus: use https for homepage
* powerdns: use https for homepage
* pm-utils: use https for homepage
* patchelf: move homepage to https
* tesseract: move homepage to github
* quodlibet: move homepage from google code
* jbrout: move homepage from google code
* eiskaltdcpp: move homepage to github
* nodejs: use https to homepage
* nix: use https for homepage
* pdf2djvu: move homepage from google code
* game-music-emu: move homepage from google code
* vacuum: move homepae from google code
Upstream changelog:
* Made some fine tuning to the hOCR output.
* Added TSV as another optional output format.
* Fixed ABI break introduced in 3.04.00 with the AnalyseLayout()
method.
* text2image tool - Enable all OpenType ligatures available in a font.
This feature requires Pango 1.38 or newer.
* Training tools - Replaced asserts with tprintf() and exit(1).
* Fixed Cygwin compatibility.
* Improved multipage tiff processing.
* Improved the embedded pdf font (pdf.ttf).
* Enable selection of OCR engine mode from command line.
* Changed tesseract command line parameter '-psm' to '--psm'.
* Added new C API for orientation and script detection, removed the old
one.
* Increased minimum autoconf version to 2.59.
* Removed dead code.
* Fixed many compiler warning.
* Fixed memory and resource leaks.
* Fixed some issues with the 'Cube' OCR engine.
* Fixed some openCL issues.
* Added option to build Tesseract with CMake build system.
* Implemented CPPAN support for easy Windows building.
The upstream URL of the change log is:
https://github.com/tesseract-ocr/tesseract/releases/tag/3.05.00
Tested by building against the following packages that directly depend
on it:
* vapoursynth (with ocrSupport = true)
* pyocr (fails)
* vobsub2srt
Also tested against the following NixOS VM tests that have OCR enabled:
* nixos/tests/chromium.nix -A stable
* nixos/tests/emacs-daemon.nix
* nixos/tests/installer.nix -A luksroot
* nixos/tests/lightdm.nix
* nixos/tests/plasma5.nix
* nixos/tests/sddm.nix
All of the packages and tests except pyocr build/succeed on
x86_64-linux.
Fixing pyocr is outside of the scope of this commit and will happen very
soon.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
I've removed that attribute in 68bc260ca2,
because the language files no longer were distributed as seperate files,
but if we for example only want to use the English training data, the
closure size of Tesseract gets quite large (around 1.2 GB), which is a
bit much just to be able to run NixOS VM tests.
For this reason I've also switched the VM tests back to using only the
English language.
Tested using the following VM tests (the ones that have OCR enabled) on
x86_64-linux:
* nixos/tests/chromium.nix -A stable
* nixos/tests/emacs-daemon.nix
* nixos/tests/installer.nix -A luksroot
* nixos/tests/lightdm.nix
* nixos/tests/plasma5.nix
* nixos/tests/sddm.nix
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
From the upstream changelog:
* Tesseract development is now done with Git and hosted at github.com
(Previously we used Subversion as a VCS and code.google.com for
hosting).
So let's move over to the GitHub repository, where the organisation also
includes a full repository for tessdata, so we no longer need to fetch
it one-by-one.
The build also got significantly simpler, because we no longer need to
run autoconf, neither do we need to patch the configure script for
Leptonica headers.
This also has the advantage that we don't need to use the
enableLanguages attribute for the test runner anymore.
Full upstream changelog can be found at:
https://github.com/tesseract-ocr/tesseract/blob/c4d273d33cc36e/ChangeLog
Tested against all NixOS tests with enabled OCR (chromium, emacs-daemon,
installer.luksroot and lightdm).
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Cc: @viric
The following parameters are now available:
* hardeningDisable
To disable specific hardening flags
* hardeningEnable
To enable specific hardening flags
Only the cc-wrapper supports this right now, but these may be reused by
other wrappers, builders or setup hooks.
cc-wrapper supports the following flags:
* fortify
* stackprotector
* pie (disabled by default)
* pic
* strictoverflow
* format
* relro
* bindnow
Especially useful for our OCR based VM tests, where we only need the
english language. By default the argument is null so all languages are
included. If a list of language name is passed only those languages are
enabled, for example:
tesseract.override { enableLanguages = [ "eng" "spa" ]; };
To only enable support for English and Spanish languages.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>