Abbyy Linux Serial Terminal
Although I use Linux both at home and at work, for some tasks, like OCR for Korean and Chinese, I have had to rely on proprietary software on Windows (ABBYY Finereader provides excellent recognition results, by the way). This is starting to change, thanks to the currently sponsored by Google. Tesseract has been around for several years, but it wasn't easily accessible before the advent of GUI frontends that make it easy to select the area of an image to be recognized.
The two more popular frontends to tesseract are YAGF (which also works with the Cuneiform OCR engine) and both of which now use the QT framework (the latter used to be based on gtk, but in recent versions, QT can also be used). Screenshot of YAGF. Tesseract's English-language recognition is almost on par with ABBYY Finereader for 300 dpi images, but much worse than Finereader at detecting images less than 300 dpi resolution. When it comes to non-English text, especially Asian text such as CJK (Chinese, Japanese, Korean) and other scripts, however, the performance of the tesseract engine still has a long way to go before matching the performance of Finereader. YAGF doesn't give the option to use Asian languages, despite the existence of tesseract data files for many Asian languages. For example, here is a listing of the available tesseract-data packages for various languages in Archlinux: [archjun@lenovoS310 cam1]$ sudo pacman -Ss tesseract-data [sudo] password for archjun: community/tesseract-data-afr 3.02.02-5 (tesseract-data) Tesseract OCR data (afr).
Tesseract OCR data (kor). Community/tesseract-data-vie 3.02.02-5 (tesseract-data) Tesseract OCR data (vie) Piping the output through wc -l gives a line count of 130, divided by 2 (two lines per entry) gives 65 unique languages supported by Tesseract. As you can see in the sample output above, Asian languages CJK and Vietnamese are supported.
According to the YAGF developer,. Relativistic Quantum Fields Bjorken Pdf Download on this page. Fortunately, gimageview does support OCR for Asian languages as long as the necessary language data for tesseract has been installed. You may notice that the screenshot of gimagereader shows Korean text being recognized. Unfortunately, tesseract does a poor job of recognizing Korean.