OwlOCR

Owlcorn Ty
Owlcrate

I turned lockdown into a side project and why you should too (owlocr.com) 2 points by frankbyte 89 days ago past Show HN: New website for a Mac screen text recognizer ( owlocr.com ). Hi everyone, I recently launched a browser-based text recognition called 'Instant Text OCR'. I tested it on Canopy's document viewer and it accurately recognizes the. OwlOCR is the only app for the Mac that will allow you to input via screenshots, image or PDF files or even using your iOS device camera.

< Previous
Next >

College of Arts and Letters Posters

Title

Presenter and Co-Authors

College

College of Arts & Letters

Department

English

Program

Applied Linguistics

Publication Date

4-2021

DOI

0000-0002-6942-6689 (McCullough)

Abstract

Ladino (or Judeo-Spanish) is a Diasporic Jewish language spoken by Sephardi Jews. There is little existing scholarly research on Ladino, nor does it have many language learning materials. These two factors compelled me to create the Aki Yerushalayim Corpus. The initial Aki Yerushalayim Corpus of Modern Written Ladino (currently ~7,000 words) was not created to act as a reference corpus of Modern Ladino. Rather, it was created to study the composition of Ladino prose and demonstrate the utility of this type project in the subdiscipline of language documentation. In addition, the project’s focus on cultural essays and narrative prose allow for insights into how Ladino writers construct their identities through word choice.

The research questions that directed the creation of the corpus are as follows: 1. What substrata feature most prominently in the Ladino lexicon? 2. Do the borrowed words belong to a particular semantic domain? 3. What parts of speech are most/least frequently borrowed?

To address and answer these questions, the gathered texts were first scanned with an optical character recognition software (OwlOCR) before being proofread for any mistakes. They were then tagged for Part of Speech and for language of origin and run through a concordancer (AntConc).

From this brief glimpse at Ladino prose in the late 20th and early 21st centuries, the results are as expected: the language is Spanish-based, and the most frequently-used substrata are Biblical Hebrew and Turkish, primarily used for religious and secular/cultural words, respectively. This pilot gives a snapshot into how authors represent their identity as Sephardi Jews in both cultural essays and narrative prose, and further study of the distribution of these loanwords will allow linguists to understand how language contact and a speech community’s history can provide context that illuminates which loanwords are most frequently adopted and why that may be.

Disciplines

Applied Linguistics | Jewish Studies

Files

Recommended Citation

McCullough, Rachel, 'The Aki Yerushalayim Corpus: A Study of Loanwords in Ladino' (2021). College of Arts and Letters Posters. 5.
https://digitalcommons.odu.edu/gradposters2021_artsletters/5

DOWNLOADS

Owlcorn Ty

Since April 01, 2021

Included in

Applied Linguistics Commons, Jewish Studies Commons

COinS

Owlcrate

OwlOCR offers simple optical character recognition of text in PDF files, images or on-screen and converts that to plain text. All conversion is done securely on-device - none of your images or files are sent to third-party services in the cloud as most applications do. This also means that the application will function behind corporate firewalls and in places without internet connectivity.

Only English language is supported with Language Recognition that may improve the results. If the application is used for documents with other languages, you should turn off Language Recognition. Please note that even in that case only the English alphabet is supported, e.g. German umlauts are not recognized.

Please note that this application does not preserve the formatting of rich text like font size, bolding, italics, etc. The output of the OCR function is plain text that you may copy and then paste to your target use or application.

Features:

New in v2: Support for taking a photo or scan of document with your iPhone or iPad device (please note requirements: support.apple.com)
High quality text recognition performed on-device securely
Input filetypes supported: PDF, GIF, PNG, JPEG, JPG.
In multipage PDF files, select to either OCR page-by-page or the whole thing
Grab a screen area for instant OCR and result copied to clipboard
Keyboard shortcut Cmd + F1 for screen area OCR even when OwlOCR is in the background
64-bit: macOS Catalina fully supported
Dark Mode supported