2018-12-07 13:55:06 (edited by Nuno 2018-12-07 13:55:54)

Hello!
How can I access the win10 OCR API using C# and the dot net framework? Where should I search?

If you want to contact me, do not use the forum PM. I respond once a year or two, when I need to write a PM myself. I apologize for the inconvenience.
Telegram: Nuno69a
E-Mail: nuno69a (at) gmail (dot) com

2018-12-07 15:33:28

thats a com api and it is available on msdn

2018-12-07 15:42:44 (edited by cartertemm 2018-12-07 15:43:53)

have a look at the windows.media.ocr namespace. This implements UWP OCR functionality, similar to that seen in NVDA. As a result, it only works on win10, but we're talking the best your gonna get natively. The results of this thing are twenty times better than that of tesseract and other engines, at least in my experience. Are you OK with OCR functionality being accessible to only a subset of users?
Here's a sample from microsoft
https://github.com/Microsoft/Windows-un … amples/OCR

2018-12-07 16:42:16

Yes, I'm OK with that. That's what I wanted to use. But the thing that disturbs me is that I really have to use UWP? Can't I use Winforms?

If you want to contact me, do not use the forum PM. I respond once a year or two, when I need to write a PM myself. I apologize for the inconvenience.
Telegram: Nuno69a
E-Mail: nuno69a (at) gmail (dot) com

2018-12-07 18:36:54

@4, you probably could, though I wouldn't recommend it. I doubt visual studio (or .NET for that matter) would be happy if you mixed two .NET libraries that are practically opposites. Give it a go though, its worth a try.

"On two occasions I have been asked [by members of Parliament!]: 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."    — Charles Babbage.
My Github

2018-12-07 19:15:58

Its a COM api after all, so you can use it without being UWP at all as far as I can see, since NVDA isn't UWP nor Windows Forms and can access it anyway. It could actually help to tak e a look into the NVDA source to see how they access it.
Best Regards.
Hijacker

2018-12-07 20:14:18

Yeah, I'll do. Or if it won't work, I'll try and learn that UWP for this little app I want to write.

If you want to contact me, do not use the forum PM. I respond once a year or two, when I need to write a PM myself. I apologize for the inconvenience.
Telegram: Nuno69a
E-Mail: nuno69a (at) gmail (dot) com

2018-12-07 21:35:24 (edited by Ethin 2018-12-07 21:39:36)

NVDA's windows 10 OCR recognition is defined in globalCommands.py, starting on line 2185:

def script_recognizeWithUwpOcr(self, gesture):
        if not winVersion.isUwpOcrAvailable():
            # Translators: Reported when Windows 10 OCR is not available.
            ui.message(_("Windows 10 OCR not available"))
            return
        from contentRecog import uwpOcr, recogUi
        recog = uwpOcr.UwpOcr()
        recogUi.recognizeNavigatorObject(recog)

Following the function trail, here is what I find, from outermost to innermost:

# contentRecog/uwpOcr.py
class UwpOcr(ContentRecognizer):
# __init__ method only
    def __init__(self, language=None):
        """
        @param language: The language code of the desired recognition language,
            C{None} to use the user's configured language.
        """
        if language:
            self.language = language
        else:
            self.language = getConfigLanguage()
        self._dll = NVDAHelper.getHelperLocalWin10Dll()
# NVDAHelper.py:
LOCAL_WIN10_DLL_PATH = os.path.join(versionedLibPath,"nvdaHelperLocalWin10.dll")
def getHelperLocalWin10Dll():
    """Get a ctypes WinDLL instance for the nvdaHelperLocalWin10 dll.
    This is a C++/CX dll used to provide access to certain UWP functionality.
    """
    return windll[LOCAL_WIN10_DLL_PATH]
# from earlier in the file...
versionedLibPath='lib'
# assumption proven: this links to C:\program files (x86)\NVDA\lib\2018.3.2\nvdaHelperLocalWin10.dll
# contentRecog/recogUi.py:
def recognizeNavigatorObject(recognizer):
    """User interface function to recognize content in the navigator object.
    This should be called from a script or in response to a GUI action.
    @param recognizer: The content recognizer to use.
    @type recognizer: L{contentRecog.ContentRecognizer}
    """
    global _activeRecog
    if isinstance(api.getFocusObject(), RecogResultNVDAObject):
        # Translators: Reported when content recognition (e.g. OCR) is attempted,
        # but the user is already reading a content recognition result.
        ui.message(_("Already in a content recognition result"))
        return
    nav = api.getNavigatorObject()
    # Translators: Reported when content recognition (e.g. OCR) is attempted,
    # but the content is not visible.
    notVisibleMsg = _("Content is not visible")
    try:
        left, top, width, height = nav.location
    except TypeError:
        log.debugWarning("Object returned location %r" % nav.location)
        ui.message(notVisibleMsg)
        return
    try:
        imgInfo = RecogImageInfo.createFromRecognizer(left, top, width, height, recognizer)
    except ValueError:
        ui.message(notVisibleMsg)
        return
    if _activeRecog:
        _activeRecog.cancel()
    # Translators: Reporting when content recognition (e.g. OCR) begins.
    ui.message(_("Recognizing"))
    sb = screenBitmap.ScreenBitmap(imgInfo.recogWidth, imgInfo.recogHeight)
    pixels = sb.captureImage(left, top, width, height)
    _activeRecog = recognizer
    recognizer.recognize(pixels, imgInfo, _recogOnResult)
# And, back to contentRecog/uwpOcr.py:
    def recognize(self, pixels, imgInfo, onResult):
        self._onResult = onResult
        @uwpOcr_Callback
        def callback(result):
            # If self._onResult is None, recognition was cancelled.
            if self._onResult:
                if result:
                    data = json.loads(result)
                    self._onResult(LinesWordsResult(data, imgInfo))
                else:
                    self._onResult(RuntimeError("UWP OCR failed"))
            self._dll.uwpOcr_terminate(self._handle)
            self._callback = None
            self._handle = None
        self._callback = callback
        self._handle = self._dll.uwpOcr_initialize(self.language, callback)
        if not self._handle:
            onResult(RuntimeError("UWP OCR initialization failed"))
            return
        self._dll.uwpOcr_recognize(self._handle, pixels, imgInfo.recogWidth, imgInfo.recogHeight)

Now, we break out of Python, into C++:

// nvdaHelper/localWin10/uwpOcr.cpp:
// Corresponds to self._dll.uwpOcr_terminate(self._handle):
void __stdcall uwpOcr_terminate(UwpOcr* instance) {
    delete instance;
}
// Corresponds to self._handle = self._dll.uwpOcr_initialize(self.language, callback):
UwpOcr* __stdcall uwpOcr_initialize(const char16* language, uwpOcr_Callback callback) {
    auto engine = OcrEngine::TryCreateFromLanguage(ref new Language(ref new String(language)));
    if (!engine)
        return nullptr;
    auto instance = new UwpOcr;
    instance->engine = engine;
    instance->callback = callback;
    return instance;
}
// corresponds to self._dll.uwpOcr_recognize(self._handle, pixels, imgInfo.recogWidth, imgInfo.recogHeight):
void __stdcall uwpOcr_recognize(UwpOcr* instance, const RGBQUAD* image, unsigned int width, unsigned int height) {
    unsigned int numBytes = sizeof(RGBQUAD) * width * height;
    auto buf = ref new Buffer(numBytes);
    buf->Length = numBytes;
    BYTE* bytes = getBytes(buf);
    memcpy(bytes, image, numBytes);
    auto sbmp = SoftwareBitmap::CreateCopyFromBuffer(buf, BitmapPixelFormat::Bgra8, width, height, BitmapAlphaMode::Ignore);
    task<OcrResult^> ocrTask = create_task(instance->engine->RecognizeAsync(sbmp));
    ocrTask.then([instance, sbmp] (OcrResult^ result) {
        auto lines = result->Lines;
        auto jLines = ref new JsonArray();
        for (unsigned short l = 0; l < lines->Size; ++l) {
            auto words = lines->GetAt(l)->Words;
            auto jWords = ref new JsonArray();
            for (unsigned short w = 0; w < words->Size; ++w) {
                auto word = words->GetAt(w);
                auto jWord = ref new JsonObject();
                auto rect = word->BoundingRect;
                jWord->Insert("x", JsonValue::CreateNumberValue(rect.X));
                jWord->Insert("y", JsonValue::CreateNumberValue(rect.Y));
                jWord->Insert("width", JsonValue::CreateNumberValue(rect.Width));
                jWord->Insert("height", JsonValue::CreateNumberValue(rect.Height));
                jWord->Insert("text", JsonValue::CreateStringValue(word->Text));
                jWords->Append(jWord);
            }
            jLines->Append(jWords);
        }
        instance->callback(jLines->Stringify()->Data());
    }).then([instance] (task<void> previous) {
        // Catch any unhandled exceptions that occurred during these tasks.
        try {
            previous.get();
        } catch (Platform::Exception^ e) {
            LOG_ERROR(L"Error " << e->HResult << L": " << e->Message->Data());
            instance->callback(NULL);
        }
    });
}

So, there's how NVDA does it. Very complex, and far too complicated for what your trying to do here. A much, much simpler (and cleaner) way to do things might look like:

using windows.media.ocr;
// all your other code here...
// Recognize via OCR
OcrEngine engine = OcrEngine.TryCreateFromUserProfileLanguages();
// OCR an image file
if (ocrEngine != null) {
// Load your image, that's up to you.
var result = await ocrEngine.RecognizeAsync(bitmap);
// Display text
Console.WriteLine(result.text);
} else {
// report error
}

I may actually consider making a Python CDLL for this; I'd use NVDA's if it wasn't so over-complicated and inflexible.

"On two occasions I have been asked [by members of Parliament!]: 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."    — Charles Babbage.
My Github

2018-12-07 23:18:10

Yeah, I can't make a DLL for it. UWP is extremely restrictive for security reasons, and while its a COM object, its usually called from UWP (like NVDA does). The difference is that NVDA OCR's the screen at the navigator objects current pixel coordinates, and since my DLL isn't attached to NVDA, I have no way of getting those. So I'm not exactly sure how else to access this API other than through UWP, and like I said, UWP is restrictive enough that:
1) I can't OCR the entire screen, and
2) I can't OCR a particular file unless its in a particular set of directories.

"On two occasions I have been asked [by members of Parliament!]: 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."    — Charles Babbage.
My Github

2018-12-08 11:30:18

Oh yeah, great indeed. In MSDN I have read that I can convert a file to a software bitmap, and then OCR it but I haven't tried it yet.

If you want to contact me, do not use the forum PM. I respond once a year or two, when I need to write a PM myself. I apologize for the inconvenience.
Telegram: Nuno69a
E-Mail: nuno69a (at) gmail (dot) com

2018-12-08 21:24:22

I've tried. Your restricted to particular directories in UWP. You can't arbitrarily read from anywhere in the filesystem.

"On two occasions I have been asked [by members of Parliament!]: 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."    — Charles Babbage.
My Github

2018-12-08 21:30:01

Can't I read from a file system, copy it to the UWP directory that OCR can read from and then perform the magic?

If you want to contact me, do not use the forum PM. I respond once a year or two, when I need to write a PM myself. I apologize for the inconvenience.
Telegram: Nuno69a
E-Mail: nuno69a (at) gmail (dot) com

2018-12-08 22:06:12

Theoretically? Yes. If you use the standard .NET file IO functions, and use those, you might be able to transform it into a software bitmap. I honestly have no idea.

"On two occasions I have been asked [by members of Parliament!]: 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."    — Charles Babbage.
My Github

2018-12-08 22:14:15

I'll try. It that's not going to work, darn it.
I just wanted to write something useful...

If you want to contact me, do not use the forum PM. I respond once a year or two, when I need to write a PM myself. I apologize for the inconvenience.
Telegram: Nuno69a
E-Mail: nuno69a (at) gmail (dot) com

2018-12-08 22:54:16

I know. I wanted ot make a Python DLL to make OCR easier to use, only to find I couldn't.

"On two occasions I have been asked [by members of Parliament!]: 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."    — Charles Babbage.
My Github

2018-12-08 22:58:25

Let's see what will happen. I'll try to work-arround that stupid limitation, but tomorrow. ENough programming for today.
As always, thanks Ethin for help.

If you want to contact me, do not use the forum PM. I respond once a year or two, when I need to write a PM myself. I apologize for the inconvenience.
Telegram: Nuno69a
E-Mail: nuno69a (at) gmail (dot) com