2024-03-20 04:12:22

So interestingly enough, here, using those coordinates and right clicking using NVDA doesn't seem to actually do anything at all. What  I can do, though, is physically move the mouse pointer around a little bit then actually right click. I don't get any options to open the image in a new tab, but it opens an immersive view dialogue that basically ends up doing the same thing. I wonder why it's different?

"You know nothing of death... allow me to teach you!" Dreadlich Tamsin
Download the latest version of my Bokura no Daibouken 3 guide here.

2024-03-20 04:29:23

The immersive view dialogue is what I get when I left click, so that's pretty strange.

Every record has been destroyed or falsified, every book rewritten, every picture has been repainted, every statue and street building has been renamed, every date has been altered. And the process is continuing day by day and minute by minute. History has stopped. Nothing exists except an endless present in which the Party is always right.

2024-03-20 11:37:04

@Exodus
Yeah that's fair. I used to use an image downloader for this, but it can be a huge pain sorting out all the crazy image link names and what's actually useful. Ironically, an AI might be able to help with that. LOL

2024-03-25 09:28:26

I use
Ecommerce Image Downloader Plus which is a Chrome extension that will let me download the main product image for Amazon through links at the bottom of the page. You can then have these images described.
https://chromewebstore.google.com/detai … nbgbnookii

2024-03-26 21:15:38

Version on GitHub now does support gemini and the three claude 3 models. So I guess you can test out if and how much it is better.

2024-03-27 11:35:37

Hey guys, I'm having an issue here with updating the addon. On the GitHub page, it mentiones that the latest version should have a dropdown where you can select which model to use. The thing is, in the NVDA Settings underneath the AI Content Describer, I have no such dropdown at all.

Is the newest version which supports this some sort of beta versoin? I see the latest as the 2024.03.13 version, but that's about it. Any idea what's going on?

2024-03-27 12:30:08

Just not an official release yet. Clone the repository itself and ccopy the files in the addon directory, then update the manifest if required. Or just wait a bit, author will probably release it soonish.

2024-03-28 03:17:18

Zersiax wrote:

Just not an official release yet. Clone the repository itself and ccopy the files in the addon directory, then update the manifest if required. Or just wait a bit, author will probably release it soonish.

Hey there,

Is this simple to do? I'm an extreme novice with anything relating to programming, so I have never done what you mention here. Thank you!

2024-03-30 05:01:19 (edited by cartertemm 2024-03-30 05:05:51)

Hello everyone,

Sorry for the silence on this. A new ask from a client of mine meant that my hands have been pretty tied over the last two weeks, but born out of necessity came face detection, modelled after VFO's recently introduced face in view feature.

I have published a pre-release for anyone who is kind or curious enough to test. Here is the changelog.

New in this version:

  • Added face detection, which will tell you whether you are clearly centered into the frame of your camera. The hotkey NVDA+shift+j is bound by default, and options to select a different device or release the camera so that other apps can use it may be found in the AI content describer context menu.

  • Added an option and script to take a picture from the onboard webcam.

  • Added Google's Gemini model.

  • Added the three major Claude 3 models (Haiku, Sonnet, and Opus).

  • Added support for Llama.cpp.

  • Rewrote the AI content describer section of the NVDA settings dialog, making many of preferences model specific. On installation of the new version your settings should automatically be ported over.

  • Added a model selection submenu to the AI content description context menu.

  • Changed some default options to make them more logical.

  • Now, when you trigger a description action, the model in use will be spoken.

I am aware that the way I've designed the model selection means that the prompt field is unnecessarily difficult to get to, a noteworthy regression. Next to bugs, that will be the first thing I'll resolve in the upcoming version, after which will hopefully come multi-turn conversation.

Anyway, that's enough talking. Here's the link to the add-on file directly. AIContentDescriber-2024.03.29.nvda-addon.

I'm looking forward to getting everyone's feedback!

2024-03-30 05:11:10

really, really wish this was free

i am a system, i have headmates, and that is my life, and my discord is rings2006wilson#8609

2024-03-30 06:42:35

cartertemm wrote:

Hello everyone,

Sorry for the silence on this. A new ask from a client of mine meant that my hands have been pretty tied over the last two weeks, but born out of necessity came face detection, modelled after VFO's recently introduced face in view feature.

I have published a pre-release for anyone who is kind or curious enough to test. Here is the changelog.

New in this version:

  • Added face detection, which will tell you whether you are clearly centered into the frame of your camera. The hotkey NVDA+shift+j is bound by default, and options to select a different device or release the camera so that other apps can use it may be found in the AI content describer context menu.

  • Added an option and script to take a picture from the onboard webcam.

  • Added Google's Gemini model.

  • Added the three major Claude 3 models (Haiku, Sonnet, and Opus).

  • Added support for Llama.cpp.

  • Rewrote the AI content describer section of the NVDA settings dialog, making many of preferences model specific. On installation of the new version your settings should automatically be ported over.

  • Added a model selection submenu to the AI content description context menu.

  • Changed some default options to make them more logical.

  • Now, when you trigger a description action, the model in use will be spoken.

I am aware that the way I've designed the model selection means that the prompt field is unnecessarily difficult to get to, a noteworthy regression. Next to bugs, that will be the first thing I'll resolve in the upcoming version, after which will hopefully come multi-turn conversation.

Anyway, that's enough talking. Here's the link to the add-on file directly. AIContentDescriber-2024.03.29.nvda-addon.

I'm looking forward to getting everyone's feedback!

Yeeeees, you are a godsend. Thank you so much for this! I'm extremely excited to try out Claude 3 Haiku. <3

2024-03-30 08:13:07

@cartertemm,
Would you consider adding an option to type in a prompt upon triggering one of the options? Sometimes you encounter an image that you want to know something specific about, and it would be easier to do that than edit the default prompt each time. The simplest might be a checkbox in the settings that when checked will bring up a TextEntryDialog with the default prompt whenever an image description is done.

2024-04-01 06:00:41

zakc93 wrote:

@cartertemm,
Would you consider adding an option to type in a prompt upon triggering one of the options? Sometimes you encounter an image that you want to know something specific about, and it would be easier to do that than edit the default prompt each time. The simplest might be a checkbox in the settings that when checked will bring up a TextEntryDialog with the default prompt whenever an image description is done.

Totally, you are actually spot on with the implementation I have planned. Except there will also be a list of saved prompts. I've noticed that most people seem to cycle through instruction sets depending on the task at hand i.e. recording a professional quality video will demand different feedback from attempting to deduce what an inaccessible button does, so the goal is to make it as straightforward as possible to toggle between them.

Another thing I've noticed is that while some of us really enjoy prompt engineering for the sake of seeing what we can squeeze out of the different models, the vast majority of this addons users just want something that works. I think there would be immense value in a repository of good prompts for different tasks broken down by model and maintained by contributions from the community. Along with demonstrating what is all possible it would be good to document what works and where, so people aren't expected to pay for a model that may not be optimal for their primary case. I have a couple describe selfie prompts that work spectacularly in Be My AI and Claud, but Gemini completely chokes on the one I use for comics. They're interesting like that.

2024-04-01 10:57:08

cartertemm wrote:
zakc93 wrote:

@cartertemm,
Would you consider adding an option to type in a prompt upon triggering one of the options? Sometimes you encounter an image that you want to know something specific about, and it would be easier to do that than edit the default prompt each time. The simplest might be a checkbox in the settings that when checked will bring up a TextEntryDialog with the default prompt whenever an image description is done.

Totally, you are actually spot on with the implementation I have planned. Except there will also be a list of saved prompts. I've noticed that most people seem to cycle through instruction sets depending on the task at hand i.e. recording a professional quality video will demand different feedback from attempting to deduce what an inaccessible button does, so the goal is to make it as straightforward as possible to toggle between them.

Another thing I've noticed is that while some of us really enjoy prompt engineering for the sake of seeing what we can squeeze out of the different models, the vast majority of this addons users just want something that works. I think there would be immense value in a repository of good prompts for different tasks broken down by model and maintained by contributions from the community. Along with demonstrating what is all possible it would be good to document what works and where, so people aren't expected to pay for a model that may not be optimal for their primary case. I have a couple describe selfie prompts that work spectacularly in Be My AI and Claud, but Gemini completely chokes on the one I use for comics. They're interesting like that.

In regards to this implementation, I thought something that could be pretty neat would be some sort of way to cycle between prompts like how you can choose different speech synthesis settings on NVDA. E.g, holding down the NVDA key, another key, and using the arrow keys to switch between prompt profiles.

The example I thought of was, say, you're playing a video game, and you have a prompt specifically for navigation, and one for reading the interface. You label them "Navigation", and "GUI". And you could swap between the two modes quickly that way. There could be maybe like three different presets for each model?

Not sure, this is one way I thought could work pretty well.

2024-04-03 06:00:08

Has any of this been updated on? i've been trying to keep tabs on it, because I have the chatPGT4 addon installed and the key. Also, can you chose to use this with Claud, Gemeni, or chatPGT? I'm probably going to be switching to Claud because of it's higher skill in some other aspects.

2024-04-03 06:09:32

if you read the topic you would know that yes, you can

i am a system, i have headmates, and that is my life, and my discord is rings2006wilson#8609

2024-04-03 07:11:02

Here's a question. Is there a way, or could a way be implemented so you can get information about an overlay text dialogue? For example. when I use the Look Up Anything mod in Stardew Valley, I then use AICD to read the screen and it describes the scene in game. Then it says there is an overlay  and it reads what is in the overlay. Then it goes back to describing the scene. So, there are times when I would just want it to tell me the info from the mod and that's it. If you want, I can post an example of this so you can see what I mean.

2024-04-09 02:00:10

@Sean-Terry01

I'm not too sure, and it might depend on the model, but you should be able to add something like "ignore the overlay and anything contained inside it" to your prompt.

2024-04-09 02:15:05

I thought I might have heard a bit back that you could ask a question from the result you get? Or, is that still being worked on?

2024-04-09 08:13:22

Sean-Terry01 wrote:

I thought I might have heard a bit back that you could ask a question from the result you get? Or, is that still being worked on?

Ya at some point, but for now you can edit the prompt under preferences and tell it what you do and don't want included. For instance, I will change it for YouTube to "just describe the contents of the video; ignore any controls or other page elements". The models will usually follow these types of instructions.

2024-04-12 04:15:32 (edited by rings2006 2024-04-12 04:26:40)

would love the option to enter a prompt when you hit the hotkey to do one of the actions, and the ability to ask things about the picture after submitting it and stuff, also hwo do i get as much details as jaws picture smart ai

i am a system, i have headmates, and that is my life, and my discord is rings2006wilson#8609

2024-04-12 18:00:06

I just updated the addon, but I'm not able to see different models to try experimenting... which version should I be using to get access to this?

Discord: clemchowder633

2024-04-12 18:12:17 (edited by defender 2024-04-12 18:58:51)

Yeah Having a series of saved preset prompts that can be accessed with the 1 through 0 keys for instance, either in combination with the NVDA key and other modifiers or in a layer setup would be incredible!
I feel bad asking for so much without giving anything in return though. A paypal.me or ko-fi link would be nice.

2024-04-12 20:51:08

Yeah, I would also like to have the ability to toggle between different prompts. I guess reserving the whole number row would interfere with some other keystrokes, but I guess setting up 2, 3 or 4 prompts and toggling between them with a single keystroke is enough for me.
As an Estonian I would often like to get some images with Estonian text described, however the English descriptions are much more accurate, so switching the prompts would make the usage much much more handy!

2024-04-14 07:37:04

so you know the nvda ocr vewer? not sure of the name, but the vewer used with nvda plus r for ocr that doesn't quite remove focus, the ai addon should be able to use that

i am a system, i have headmates, and that is my life, and my discord is rings2006wilson#8609