Hello!
How can I access the win10 OCR API using C# and the dot net framework? Where should I search?
Telegram: Nuno69a
E-Mail: nuno69a (at) gmail (dot) com
You are not logged in. Please login or register.
AudioGames.net Forum → Developers room → Windows10 OCR API accessing? - C#
Hello!
How can I access the win10 OCR API using C# and the dot net framework? Where should I search?
have a look at the windows.media.ocr namespace. This implements UWP OCR functionality, similar to that seen in NVDA. As a result, it only works on win10, but we're talking the best your gonna get natively. The results of this thing are twenty times better than that of tesseract and other engines, at least in my experience. Are you OK with OCR functionality being accessible to only a subset of users?
Here's a sample from microsoft
https://github.com/Microsoft/Windows-un … amples/OCR
Yes, I'm OK with that. That's what I wanted to use. But the thing that disturbs me is that I really have to use UWP? Can't I use Winforms?
@4, you probably could, though I wouldn't recommend it. I doubt visual studio (or .NET for that matter) would be happy if you mixed two .NET libraries that are practically opposites. Give it a go though, its worth a try.
Its a COM api after all, so you can use it without being UWP at all as far as I can see, since NVDA isn't UWP nor Windows Forms and can access it anyway. It could actually help to tak e a look into the NVDA source to see how they access it.
Best Regards.
Hijacker
Yeah, I'll do. Or if it won't work, I'll try and learn that UWP for this little app I want to write.
NVDA's windows 10 OCR recognition is defined in globalCommands.py, starting on line 2185:
def script_recognizeWithUwpOcr(self, gesture):
if not winVersion.isUwpOcrAvailable():
# Translators: Reported when Windows 10 OCR is not available.
ui.message(_("Windows 10 OCR not available"))
return
from contentRecog import uwpOcr, recogUi
recog = uwpOcr.UwpOcr()
recogUi.recognizeNavigatorObject(recog)
Following the function trail, here is what I find, from outermost to innermost:
# contentRecog/uwpOcr.py
class UwpOcr(ContentRecognizer):
# __init__ method only
def __init__(self, language=None):
"""
@param language: The language code of the desired recognition language,
C{None} to use the user's configured language.
"""
if language:
self.language = language
else:
self.language = getConfigLanguage()
self._dll = NVDAHelper.getHelperLocalWin10Dll()
# NVDAHelper.py:
LOCAL_WIN10_DLL_PATH = os.path.join(versionedLibPath,"nvdaHelperLocalWin10.dll")
def getHelperLocalWin10Dll():
"""Get a ctypes WinDLL instance for the nvdaHelperLocalWin10 dll.
This is a C++/CX dll used to provide access to certain UWP functionality.
"""
return windll[LOCAL_WIN10_DLL_PATH]
# from earlier in the file...
versionedLibPath='lib'
# assumption proven: this links to C:\program files (x86)\NVDA\lib\2018.3.2\nvdaHelperLocalWin10.dll
# contentRecog/recogUi.py:
def recognizeNavigatorObject(recognizer):
"""User interface function to recognize content in the navigator object.
This should be called from a script or in response to a GUI action.
@param recognizer: The content recognizer to use.
@type recognizer: L{contentRecog.ContentRecognizer}
"""
global _activeRecog
if isinstance(api.getFocusObject(), RecogResultNVDAObject):
# Translators: Reported when content recognition (e.g. OCR) is attempted,
# but the user is already reading a content recognition result.
ui.message(_("Already in a content recognition result"))
return
nav = api.getNavigatorObject()
# Translators: Reported when content recognition (e.g. OCR) is attempted,
# but the content is not visible.
notVisibleMsg = _("Content is not visible")
try:
left, top, width, height = nav.location
except TypeError:
log.debugWarning("Object returned location %r" % nav.location)
ui.message(notVisibleMsg)
return
try:
imgInfo = RecogImageInfo.createFromRecognizer(left, top, width, height, recognizer)
except ValueError:
ui.message(notVisibleMsg)
return
if _activeRecog:
_activeRecog.cancel()
# Translators: Reporting when content recognition (e.g. OCR) begins.
ui.message(_("Recognizing"))
sb = screenBitmap.ScreenBitmap(imgInfo.recogWidth, imgInfo.recogHeight)
pixels = sb.captureImage(left, top, width, height)
_activeRecog = recognizer
recognizer.recognize(pixels, imgInfo, _recogOnResult)
# And, back to contentRecog/uwpOcr.py:
def recognize(self, pixels, imgInfo, onResult):
self._onResult = onResult
@uwpOcr_Callback
def callback(result):
# If self._onResult is None, recognition was cancelled.
if self._onResult:
if result:
data = json.loads(result)
self._onResult(LinesWordsResult(data, imgInfo))
else:
self._onResult(RuntimeError("UWP OCR failed"))
self._dll.uwpOcr_terminate(self._handle)
self._callback = None
self._handle = None
self._callback = callback
self._handle = self._dll.uwpOcr_initialize(self.language, callback)
if not self._handle:
onResult(RuntimeError("UWP OCR initialization failed"))
return
self._dll.uwpOcr_recognize(self._handle, pixels, imgInfo.recogWidth, imgInfo.recogHeight)
Now, we break out of Python, into C++:
// nvdaHelper/localWin10/uwpOcr.cpp:
// Corresponds to self._dll.uwpOcr_terminate(self._handle):
void __stdcall uwpOcr_terminate(UwpOcr* instance) {
delete instance;
}
// Corresponds to self._handle = self._dll.uwpOcr_initialize(self.language, callback):
UwpOcr* __stdcall uwpOcr_initialize(const char16* language, uwpOcr_Callback callback) {
auto engine = OcrEngine::TryCreateFromLanguage(ref new Language(ref new String(language)));
if (!engine)
return nullptr;
auto instance = new UwpOcr;
instance->engine = engine;
instance->callback = callback;
return instance;
}
// corresponds to self._dll.uwpOcr_recognize(self._handle, pixels, imgInfo.recogWidth, imgInfo.recogHeight):
void __stdcall uwpOcr_recognize(UwpOcr* instance, const RGBQUAD* image, unsigned int width, unsigned int height) {
unsigned int numBytes = sizeof(RGBQUAD) * width * height;
auto buf = ref new Buffer(numBytes);
buf->Length = numBytes;
BYTE* bytes = getBytes(buf);
memcpy(bytes, image, numBytes);
auto sbmp = SoftwareBitmap::CreateCopyFromBuffer(buf, BitmapPixelFormat::Bgra8, width, height, BitmapAlphaMode::Ignore);
task<OcrResult^> ocrTask = create_task(instance->engine->RecognizeAsync(sbmp));
ocrTask.then([instance, sbmp] (OcrResult^ result) {
auto lines = result->Lines;
auto jLines = ref new JsonArray();
for (unsigned short l = 0; l < lines->Size; ++l) {
auto words = lines->GetAt(l)->Words;
auto jWords = ref new JsonArray();
for (unsigned short w = 0; w < words->Size; ++w) {
auto word = words->GetAt(w);
auto jWord = ref new JsonObject();
auto rect = word->BoundingRect;
jWord->Insert("x", JsonValue::CreateNumberValue(rect.X));
jWord->Insert("y", JsonValue::CreateNumberValue(rect.Y));
jWord->Insert("width", JsonValue::CreateNumberValue(rect.Width));
jWord->Insert("height", JsonValue::CreateNumberValue(rect.Height));
jWord->Insert("text", JsonValue::CreateStringValue(word->Text));
jWords->Append(jWord);
}
jLines->Append(jWords);
}
instance->callback(jLines->Stringify()->Data());
}).then([instance] (task<void> previous) {
// Catch any unhandled exceptions that occurred during these tasks.
try {
previous.get();
} catch (Platform::Exception^ e) {
LOG_ERROR(L"Error " << e->HResult << L": " << e->Message->Data());
instance->callback(NULL);
}
});
}
So, there's how NVDA does it. Very complex, and far too complicated for what your trying to do here. A much, much simpler (and cleaner) way to do things might look like:
using windows.media.ocr;
// all your other code here...
// Recognize via OCR
OcrEngine engine = OcrEngine.TryCreateFromUserProfileLanguages();
// OCR an image file
if (ocrEngine != null) {
// Load your image, that's up to you.
var result = await ocrEngine.RecognizeAsync(bitmap);
// Display text
Console.WriteLine(result.text);
} else {
// report error
}
I may actually consider making a Python CDLL for this; I'd use NVDA's if it wasn't so over-complicated and inflexible.
Yeah, I can't make a DLL for it. UWP is extremely restrictive for security reasons, and while its a COM object, its usually called from UWP (like NVDA does). The difference is that NVDA OCR's the screen at the navigator objects current pixel coordinates, and since my DLL isn't attached to NVDA, I have no way of getting those. So I'm not exactly sure how else to access this API other than through UWP, and like I said, UWP is restrictive enough that:
1) I can't OCR the entire screen, and
2) I can't OCR a particular file unless its in a particular set of directories.
Oh yeah, great indeed. In MSDN I have read that I can convert a file to a software bitmap, and then OCR it but I haven't tried it yet.
I've tried. Your restricted to particular directories in UWP. You can't arbitrarily read from anywhere in the filesystem.
Can't I read from a file system, copy it to the UWP directory that OCR can read from and then perform the magic?
Theoretically? Yes. If you use the standard .NET file IO functions, and use those, you might be able to transform it into a software bitmap. I honestly have no idea.
I'll try. It that's not going to work, darn it.
I just wanted to write something useful...
I know. I wanted ot make a Python DLL to make OCR easier to use, only to find I couldn't.
Let's see what will happen. I'll try to work-arround that stupid limitation, but tomorrow. ENough programming for today.
As always, thanks Ethin for help.
AudioGames.net Forum → Developers room → Windows10 OCR API accessing? - C#
Generated in 0.027 seconds (47% PHP - 53% DB) with 11 queries