Commit Graph

41 Commits

Author SHA1 Message Date
Erik Arvstedt
1a32663efc
treewide: rename maintainer earvstedt -> erikarvstedt
The maintainer name now matches the Github username, which simplifies
maintainer notifications.
2022-06-26 19:12:18 +02:00
Anselm Schüler
92a849e613
tesseract5: init at 5.1.0 2022-05-02 13:44:04 +02:00
Anselm Schüler
1b38ecfc74
tesseract4.tessdata: 4.0.0 -> 4.1.0 2022-05-02 13:44:03 +02:00
Erik Arvstedt
7d2cee008b
tesseract: add wrapper test 2022-05-02 13:44:03 +02:00
Erik Arvstedt
0532532835
tesseract: use multi-line build inputs format 2022-05-02 12:39:50 +02:00
Erik Arvstedt
683c50d805
tesseract: switch to SRI hash format 2022-05-02 12:39:49 +02:00
Erik Arvstedt
42b719a3be
tesseract: fix fetch-language-hashes
- Don't match spaces in language names
- Remove duplicate languages
2022-05-02 12:39:49 +02:00
Artturin
e50c9981d3 tesseract4: apply patches to fix build on aarch64-darwin 2021-11-14 16:21:22 +02:00
Ben Siraphob
e03c068af5 treewide: makeWrapper buildInputs to nativeBuildInputs 2021-02-19 20:09:16 +07:00
Pavol Rusnak
a6ce00c50c
treewide: remove stdenv where not needed 2021-01-25 18:31:47 +01:00
Jonathan Ringer
9bb3fccb5b treewide: pkgs.pkgconfig -> pkgs.pkg-config, move pkgconfig to alias.nix
continuation of #109595

pkgconfig was aliased in 2018, however, it remained in
all-packages.nix due to its wide usage. This cleans
up the remaining references to pkgs.pkgsconfig and
moves the entry to aliases.nix.

python3Packages.pkgconfig remained unchanged because
it's the canonical name of the upstream package
on pypi.
2021-01-19 01:16:25 -08:00
Ben Siraphob
badf51221d treewide: stdenv.lib -> lib 2021-01-16 17:58:11 +07:00
Michael Reilly
84cf00f980
treewide: Per RFC45, remove all unquoted URLs 2020-04-10 17:54:53 +01:00
Erik Arvstedt
e6fad86853 tesseract: 4.1.0 -> 4.1.1 2020-01-17 13:19:12 +00:00
volth
46420bbaa3 treewide: name -> pname (easy cases) (#66585)
treewide replacement of

stdenv.mkDerivation rec {
  name = "*-${version}";
  version = "*";

to pname
2019-08-15 13:41:18 +01:00
R. RyanTM
9f29ecbaf4 tesseract4: 4.0.0 -> 4.1.0
Semi-automatic update generated by
https://github.com/ryantm/nixpkgs-update tools. This update was made
based on information from
https://repology.org/metapackage/tesseract/versions
2019-07-17 09:33:34 +02:00
volth
f3282c8d1e treewide: remove unused variables (#63177)
* treewide: remove unused variables

* making ofborg happy
2019-06-16 19:59:05 +00:00
Erik Arvstedt
0289f4adf0
tesseract: add tesseract3 top-level attr 2018-12-19 18:10:43 +01:00
Erik Arvstedt
8d1ba999cb
tesseract: rename to tesseract4, add alias
This is more consistent with the naming of the most popular versioned pkgs.
2018-12-19 18:09:56 +01:00
Erik Arvstedt
b818997807
tesseract: add separate language derivations
This frees users from downloading all languages when building
Tesseract with a custom set of languages.

`enableLanguagesHash` is now obsolete.
2018-12-19 18:08:21 +01:00
Erik Arvstedt
aaaed13077
tesseract: add a wrapper to setup languages
Tesseract is now decoupled from the tessdata language corpus.

This avoids recompilation when building Tesseract with a custom set
of languages.

Update k2pdfopt to use the new wrapper interface.
2018-12-19 18:08:16 +01:00
Erik Arvstedt
45d2a2dd91
tesseract: change file layout
Rename default.nix -> tesseract3.nix
Rename 4.x.nix -> tesseract4.nix

This is needed for the following commits.
2018-12-19 18:07:39 +01:00
Ryan Mulligan
d4b9752212 tesseract_4: 4.00.00alpha-git-20170410 -> 4.0.0
The 4.0.0 stable release is out.

Changelog: https://github.com/tesseract-ocr/tesseract/wiki/4.0x-Changelog
2018-11-24 15:25:59 -08:00
symphorien
b30d52905e tesseract: make tessdata a fix output derivation (#41227)
the full tessdata is nearly a GB, so sparing a copy each time we need to
rebuild tesseract without updating tessdata is worth it.
2018-06-19 00:03:48 +02:00
Matthew Justin Bauer
2eacddf0dc treewide: homepage URL fixes (#28475)
* pgadmin: use https homepage

* msn-pecan: move homepage to github

google code is now unavailable

* pidgin-latex: use https for homepage

* pidgin-opensteamworks: use github for homepage

google code is unavailable

* putty: use https for homepage

* ponylang: use https for homepage

* picolisp: use https for homepage

* phonon: use https for homepage

* pugixml: use https for homepage

* pioneer: use https for homepage

* packer: use https for homepage

* pokerth: usee https for homepage

* procps-ng: use https for homepage

* pycaml: use https for homepage

* proot: move homepage to .github.io

* pius: use https for homepage

* pdfread: use https for homepage

* postgresql: use https for homepage

* ponysay: move homepage to new site

* prometheus: use https for homepage

* powerdns: use https for homepage

* pm-utils: use https for homepage

* patchelf: move homepage to https

* tesseract: move homepage to github

* quodlibet: move homepage from google code

* jbrout: move homepage from google code

* eiskaltdcpp: move homepage to github

* nodejs: use https to homepage

* nix: use https for homepage

* pdf2djvu: move homepage from google code

* game-music-emu: move homepage from google code

* vacuum: move homepae from google code
2017-08-22 20:50:04 +02:00
Matthew Bauer
f1346f5854
tesseract: supports darwin 2017-04-23 18:08:51 -05:00
aszlig
7b5263e1a6
tesseract: Package version 4.x from Git master
Tesseract 4 has got a new long short-term memory neural networking based
OCR engine which really helps a lot in terms of accuracy and our VM
tests.

I ran the new version across a bunch of different screenshots and
comparing the results to the 3.x branch and it really makes a big
difference, especially with various font rendering settings.

The only downside of this is that version 4 hasn't been released yet and
is in alpha state right now, but it will eventually get there and the
only solutions that came into my mind sticking to version 3 were really
sub-par:

 * Use several passes with different color negation on the screenshots.
 * Train Tesseract 3 specifically for screenshots. This is sub-par
   because we'd need to do it for Tesseract 4 from scratch again.
 * Change the test systems so that it specifically uses *only* OCR an
   font when displaying. I've actually tried this but this also isn't
   accurate enough with our default font rendering setup.
 * Turn off special font rendering settings for our tests. In
   conjunction with changing to an OCR font this might work but it won't
   catch all the cases, because applications might use their own font
   rendering.

Given that version 4 is faster[1] when it comes to OCR detection and also
the points just mentioned I think even using the alpha version just for
tests isn't going to hurt anybody.

[1]: https://github.com/tesseract-ocr/tesseract/wiki/4.0-Accuracy-and-Performance

Signed-off-by: aszlig <aszlig@redmoonstudios.org>
2017-04-11 03:21:46 +02:00
aszlig
c381fa9b63
tesseract: 3.04.01 -> 3.05.00
Upstream changelog:

 * Made some fine tuning to the hOCR output.
 * Added TSV as another optional output format.
 * Fixed ABI break introduced in 3.04.00 with the AnalyseLayout()
   method.
 * text2image tool - Enable all OpenType ligatures available in a font.
   This feature requires Pango 1.38 or newer.
 * Training tools - Replaced asserts with tprintf() and exit(1).
 * Fixed Cygwin compatibility.
 * Improved multipage tiff processing.
 * Improved the embedded pdf font (pdf.ttf).
 * Enable selection of OCR engine mode from command line.
 * Changed tesseract command line parameter '-psm' to '--psm'.
 * Added new C API for orientation and script detection, removed the old
   one.
 * Increased minimum autoconf version to 2.59.
 * Removed dead code.
 * Fixed many compiler warning.
 * Fixed memory and resource leaks.
 * Fixed some issues with the 'Cube' OCR engine.
 * Fixed some openCL issues.
 * Added option to build Tesseract with CMake build system.
 * Implemented CPPAN support for easy Windows building.

The upstream URL of the change log is:

https://github.com/tesseract-ocr/tesseract/releases/tag/3.05.00

Tested by building against the following packages that directly depend
on it:

 * vapoursynth (with ocrSupport = true)
 * pyocr (fails)
 * vobsub2srt

Also tested against the following NixOS VM tests that have OCR enabled:

 * nixos/tests/chromium.nix -A stable
 * nixos/tests/emacs-daemon.nix
 * nixos/tests/installer.nix -A luksroot
 * nixos/tests/lightdm.nix
 * nixos/tests/plasma5.nix
 * nixos/tests/sddm.nix

All of the packages and tests except pyocr build/succeed on
x86_64-linux.

Fixing pyocr is outside of the scope of this commit and will happen very
soon.

Signed-off-by: aszlig <aszlig@redmoonstudios.org>
2017-04-11 03:21:32 +02:00
aszlig
288a79187c
tesseract: Reintroduce enableLanguages
I've removed that attribute in 68bc260ca2,
because the language files no longer were distributed as seperate files,
but if we for example only want to use the English training data, the
closure size of Tesseract gets quite large (around 1.2 GB), which is a
bit much just to be able to run NixOS VM tests.

For this reason I've also switched the VM tests back to using only the
English language.

Tested using the following VM tests (the ones that have OCR enabled) on
x86_64-linux:

 * nixos/tests/chromium.nix -A stable
 * nixos/tests/emacs-daemon.nix
 * nixos/tests/installer.nix -A luksroot
 * nixos/tests/lightdm.nix
 * nixos/tests/plasma5.nix
 * nixos/tests/sddm.nix

Signed-off-by: aszlig <aszlig@redmoonstudios.org>
2017-04-11 03:21:26 +02:00
aszlig
68bc260ca2
tesseract: 3.02.02 -> 3.04.01
From the upstream changelog:

 * Tesseract development is now done with Git and hosted at github.com
   (Previously we used Subversion as a VCS and code.google.com for
   hosting).

So let's move over to the GitHub repository, where the organisation also
includes a full repository for tessdata, so we no longer need to fetch
it one-by-one.

The build also got significantly simpler, because we no longer need to
run autoconf, neither do we need to patch the configure script for
Leptonica headers.

This also has the advantage that we don't need to use the
enableLanguages attribute for the test runner anymore.

Full upstream changelog can be found at:

https://github.com/tesseract-ocr/tesseract/blob/c4d273d33cc36e/ChangeLog

Tested against all NixOS tests with enabled OCR (chromium, emacs-daemon,
installer.luksroot and lightdm).

Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Cc: @viric
2016-12-19 22:25:38 +01:00
Franz Pletz
aff1f4ab94 Use general hardening flag toggle lists
The following parameters are now available:

  * hardeningDisable
    To disable specific hardening flags
  * hardeningEnable
    To enable specific hardening flags

Only the cc-wrapper supports this right now, but these may be reused by
other wrappers, builders or setup hooks.

cc-wrapper supports the following flags:

  * fortify
  * stackprotector
  * pie (disabled by default)
  * pic
  * strictoverflow
  * format
  * relro
  * bindnow
2016-03-05 18:55:26 +01:00
Robin Gloster
ea1de67f35 tesseract: turn off format hardening 2016-02-20 22:33:10 +00:00
Mateusz Kowalczyk
6014752e73 tesseract: fix postInstall
We needed to separate each of the unpack commands.
2015-05-23 02:27:47 +01:00
aszlig
adb7581459
tesseract: Allow to specify a subset of languages.
Especially useful for our OCR based VM tests, where we only need the
english language. By default the argument is null so all languages are
included. If a list of language name is passed only those languages are
enabled, for example:

tesseract.override { enableLanguages = [ "eng" "spa" ]; };

To only enable support for English and Spanish languages.

Signed-off-by: aszlig <aszlig@redmoonstudios.org>
2015-05-22 07:45:59 +02:00
Mateusz Kowalczyk
03a37d5851 Add Japanese to default tesseract languages 2014-08-17 11:54:25 +01:00
Mateusz Kowalczyk
7a45996233 Turn some license strings into lib.licenses values 2014-07-28 11:31:14 +02:00
Domen Kozar
808cadd390 tesseract: simplify 2013-06-12 00:50:52 +02:00
Domen Kozar
1b64fc9360 tesseract: upgrade to 3.02.02 and add some languages 2013-06-11 19:22:30 +02:00
Florian Friesdorf
892947cd93 tesseract-3.0.1
svn path=/nixpkgs/trunk/; revision=34453
2012-06-11 10:28:28 +00:00
Lluís Batlle i Rossell
9a0a0c92c7 Adding training results files for some languages to tesseract to be able to do OCR directly.
svn path=/nixpkgs/trunk/; revision=26956
2011-04-24 20:01:19 +00:00
Lluís Batlle i Rossell
626f654602 Adding tesseract, an OCR engine I just found but never tried.
svn path=/nixpkgs/trunk/; revision=26952
2011-04-24 18:04:07 +00:00