2023-11-21 10:48:49

I'm also having the issue with the navigator object reporting "No Content Visible," by the way.
My preferred way to grab screenshots with this thing is by mapping "Take a screenshot, then describe it using AI" in Input Gestures, though I can see why scanning just the navigator object could come in handy.
You might want to create an issue about this on Github. I created one earlier and Carter was able to fix it in just a couple hours.

I can try to describe pricing tomorrow if I have time... or maybe I'll just make a PR so it can be added to the README.
We'll see. Tomorrow's gonna be busy and I need to get to bed!

I'm probably gonna get banned for this, but...

2023-11-21 11:00:16

@25: Describing videos at the moment is technically not possible.
Well, it is, but you have  to capture each individual frame as a picture and send them all together.
Right now, the addon doesn't support sending more than one image at a time but if it did, you could technically do this. It would be time-consuming and also a little pricy, but it has actually been done before.
Check this] out of you're curious. It's actually really incredible.

I'm probably gonna get banned for this, but...

2023-11-21 11:03:17

So now there are other ways to access GPT 4 without needing to subscribe to plus?

Kind regards!

Add me on battle.net and let's have fun, region is Europe, my BattleTag is: Hajjar#21470
By reading my post, you agree to my terms and conditions :P

2023-11-21 11:31:08

Yes, you can use it over the API.

Ugh, whatever. I give up. You have a good point about rate limits, to be fair. I'll wait until there are benchmarks and then come back here and say "I told you so".

2023-11-21 15:52:47

@29 So what is your secondary language? Or the one you speak well enough to get image descriptions in?

2023-11-21 16:23:01

So at first I missed there's a shortcut for directly doing the description of what the navigator object is on. This seems to work, same if I set a shortcut for describing the current focus. It's only when you try to go through the menu that it doesn't work. So ultimately not really much of a problem.

2023-11-21 18:03:38

Can I get a gpt 4 key from someone? It doesn't work for me with my free key.

2023-11-21 18:11:35

@cartertemm
Kick ass dude! I've just been using a hugging face frontend that's slow and buggy AF. This is much better, thank you! smile

2023-11-21 18:25:01

I know it mentions ability to read graphs, and I have used be my AI to describe graphs on phone before, but I'm wondering how accurate the descriptions are, like can you use this to pass a math test with inaccessible graphs or is it too risky due to hallucinations?

2023-11-21 18:39:23

gal wrote:

Can I get a gpt 4 key from someone? It doesn't work for me with my free key.

Given that this uses resources on a pay-as-you-go basis, I doubt whether somebody will give you a free key to spend their money. I hope you can find somebody that will though.

I have similar suggestions/problems as others. The request should not block, the act of popping up a context menu makes things go out of focus when you try to recognize them, and yeah it would be nice to have conversational capabilities. The addon is great though and I'm glad somebody finally did it.

2023-11-21 19:41:02

I've been blown away by the support and use cases people have been coming up with, thank you!

To address a few questions:

aaron77 wrote:

Does the Optimize images for size checkbox set the detail parameter when calling the Vision API to low?

This was the intention, yes. It was supposed to also take pictures in lower resolution and then compress them to cut down on upload time. I'm still considering how that piece should actually work because unsurprisingly, snapping a screenshot, modifying the resolution, and asking the API to do the same doesn't actually result in any performance improvements--sometimes the opposite even.

Aaron77 wrote:

have you considered allowing users to reply to the image descriptions? That would open up many more use cases, I think.

The thought has certainly crossed my mind, and once I can ensure the rest of the features are stable, I might work on it. Problem is, unlike any of the other APIs, vision preview doesn't seem to remember the conversation. Which means I'll need to upload each message again whenever the user asks a question. This would get crazy expensive... and fast.
To be honest I was kinda holding out hope that Sama and team would address this limitation but given the shitshow going on at OpenAI right now, I've gone from bullish to just glad developers haven't lost the ability to build on the models in a weekend.

Aaron77 wrote:

Could you possibly make the prompt field a multiline field?

Easily. One caveat to be aware of: the NVDA settings dialog appears to intersept the enter, control+enter, and shift+enter keys. I'll have to find a way to overwrite this behavior, but it'll probably be a part of the next release.

Aaron77 wrote:

could you possibly look into having the function that calls the vision API be called by a separate thread?

Good catch! It's already doing this when you snap an object from the menu, but other recognitions are indeed performed on the main thread. This will be done for the next release as well.

Speaking of the menu, there is in fact a fairly annoying bug when we try to describe focus or navigator objects without using their defined keystrokes. It occurs because popping up the menu changes both the focus and navigator positions, and somehow simply caching and setting them again is insufficient. I'll continue to play around with some of the more obscure pieces of NVDA to figure out why this is happening. It's 100% possible, just a matter of finding and reciting the right incantations.

As a tip, if you add "for some one who is blind" to the end of the prompt, it appears to add greater detail to the explanations - undoubtedly a byproduct of the collab with Be my Eyes.

@Defender

Thanks. It's been a while, I hope you are well. Out of curiosity, what frontend? Is it the LLaVA demo that's been making it's rounds recently? Or MiniGPT4?

2023-11-21 20:03:17

Unfortunately, the only stateful API OpenaAI has is the new Assistants API, which does not accept image inputs yet. Even the assistants API though will charge you for the entire conversation's token cost for every request.  Prices will definitely skyrocket, and perhaps the conversation option should be behind a disclaimer prompt warning people that they will spend a lot of money.
Thankfully, input tokens are only $0.01 per thousand tokens and when you send new messages, the entire context of the conversation to that poin is considered a blob of input tokens, so it's not as expensive as you'd think.
GPT-4 was so much worse when it was $0.03 and $0.06 though!

I'm probably gonna get banned for this, but...

2023-11-21 20:37:04

It's cool, but i often get an error saying the read operation timed out. What's wrong?

Best regards: Marco

2023-11-21 20:54:01

It means the addon didn't wait long enough to receive the response from the API. The time it waits is adjustable in settings, though unless you build the addon from source from the repository, there's a bug where it doesn't respect the time you set.

I'm probably gonna get banned for this, but...

2023-11-21 21:52:30 (edited by 4SenseGaming 2023-11-21 21:53:42)

Incredible job! When Be My AI first started, I thought it was cool but wondered when someone was going to release something similar for Windows. A dream come true. Thanks a ton for this, Carter!
DJWolfy literally has no clue what they're talking about here. Yeah, the conversational part of the model does tend to perform worse in minor languages but it still gets the point across, even if with worse grammar sometimes. Czech, which is my native language, was only 0.07 % of the total training data used, and it still works. I have noticed no difference at all between prompting it to describe an image in Czech or English, the level of detail and accuracy is generally the same. I'm no researcher, true, but I've done enough practical testing to claim this with certainty. I still prefer English for chats, but I use both languages interchangeably for image descriptions with equal results. Hallucinations have nothing to do with the language used at all, ever.
Lukas

Far be it that the king of Bohemia should run away! (John of Luxembourg aka John the Blind)
There are no heroes without courage!
1428: Shadows over Silesia by KUBI Games
https://blind.shadows1428.com

2023-11-21 23:05:36

Version 2023.11.21 released with the following enhancements:

  • Operations that are triggered by a keystroke no longer cause NVDA to freeze up for a few seconds.

  • The prompt field found in the settings dialog now accepts multiple lines of input. Unfortunately, you can't yet use enter, control+enter, or shift+enter to insert a new line due to the way that NVDA settings dialogs work, but text can still be pasted here.

  • Timeouts now behave as one might expect. If you ever get an error saying something like "the read operation timed out", just increase this value.

I'm also trying to submit this to the add-on store for wider availability, but so far it's not working too well.

2023-11-22 01:31:35

@40: That's awesome! Are you a ChatGPT user? If so, have you tried voice chatting with it in Czech? Their tts voices are really good at speaking Spanish, though they do have a slight American accent. I'm always curious to hear about how well they perform with other languages.
I think as of earlier today, all ChatGPT users can voice chat, though if you're not a plus subscriber you'll be limited to speaking to GPT-3.5. The tts engine should be identical though.

I'm probably gonna get banned for this, but...

2023-11-22 03:00:19

FYI that I had to push a quick hotfix, version 2023.11.22.

Changes aren't super noticeable, except I fixed a bug where the menu wouldn't show up sometimes, and spaces were removed in the internal name to honor manifest conventions. Let's hope we can avoid another release tomorrow lest we get a ridiculous version number lol.

2023-11-22 04:06:00

As much as I want to try this thing, Open AI reject my card for some reasons. So I can't top up my account.

2023-11-22 04:26:40

So I'm not sure what changed, but with the new version, the shortcuts for object nav and focus also just report content not visible. Is this happening to anyone else?

2023-11-22 05:12:50

oh yep. neither of them are working for me either!
guess we will be having a funny version number tomorrow after all! haha

I'm probably gonna get banned for this, but...

2023-11-22 06:38:20

Welp, multi-threaded lookups are the gift that keep on giving. After I thought I'd learned just about every lesson imaginable about them, we have your 2023.11.22.1.

In short you should now be able to recognize from any object, in the menu and not. An error will be spoken if screen curtain is enabled while doing this.

Thanks again for all the help locating these issues. I'm not going to claim it's stable quite yet, but we're getting close.

2023-11-22 07:12:10 (edited by Shadowcat 2023-11-22 07:17:53)

So far, loving this thing. Awesome work! The one issue I have though is, if you change your prompt to ask for as detailed of a description as possible, the response will get cut off part way through. Is this something that can be fixed? Or is it a limitation of the platform that I, lacking knowledge of how this stuff actually works just don't know about.
As an example of what I mean,

The image displays a plush toy representing an anthropomorphic orange cat. The cat plush stands upright against a neutral background with a subtle geometric pattern. It features a bright orange fur with darker orange stripes along its back and tail. The plush's belly is a lighter, almost yellowish fur which stretches up to the bottom of its face.

The cat's facial expression is attentive with wide-open green eyes centered around large black pupils, which provide a sense of curiosity or alertness. Just below the eyes, there is a modest triangular nose in a pale pinkish color. Thin black lines serve as whiskers, protruding from white patches on either side of the cat's face. The cat's ears are perked up and tipped with a darker orange hue akin to the stripes on its back.

Around the neck, this feline character sports a white collar with a black stripe, perhaps evoking the look of a harness or a specialized outfit. Additionally, positioned between the toy's front paws is a small, stylized plush representation of a blue and gray robot with circular green eyes and antenna, suggesting a futuristic or technological accessory accompanying the cat character. The cat is seated, and both its hands and feet are a yellow color, providing a contrast to the

"You know nothing of death... allow me to teach you!" Dreadlich Tamsin
Download the latest version of my Bokura no Daibouken 3 guide here.

2023-11-22 07:42:06

To allow for a larger response, you can increase the max tokens setting. The default is 250. Keep in mind that you get charged by the tokens you use at a rate of $0.01 per 1000.

2023-11-22 08:00:52

@cartertemm
It was mini gpt yeah.