Home Technology The humble screenshot could be the key to great AI Assistants

The humble screenshot could be the key to great AI Assistants

0
The humble screenshot could be the key to great AI Assistants

To make the most of a world that is increasingly filled with AI-based tools, you should develop a habit: take screenshots. Take lots of screenshots. You can take screenshots of anything. For all the talk about voice modes, omnipresent camera, and the future of multimodality, pressing the buttons to save what you are looking at might be the most valuable digital behavior.

Screenshots can be used to capture digital information in a variety of ways. You can capture almost anythingwith Netflix! — in just a few clicks you can save and share your images to any device, app or person. Johnny Bree, founder of digital storage app Fabricsays that it’s a portable data format. “There’s no other portable software that allows you to move from one piece of software to another.”

The screenshot contains many details, including its source, content, and the time in the corner. It sends a complex and important message: This is important to me. There are countless new AI tools which aim to monitor the world, our lives and everything and make sense of it for us. These tools are mostly useless for a variety of reasons, but mainly because AI is good at identifying things but terrible at determining whether they are important. A screenshot gives the system a sense of value and makes it aware that it should pay attention.

A screenshot also puts you, the user in control in a very important way. Mattias Deserti is the head of smartphone advertising at Nothing. “If I gave you access to my emails, my WhatsApps and everything, it would be a lot of noise,” he says. There’s no reason to store every email or webpage you visit, let alone the privacy implications. What if you could start training the system by yourself? You would be able to feed the system with the information that you want it to know.

Up until now, screenshots were a blunt tool. You snap a screenshot, and it’s saved to your camera, where it likely languishes for eternity, forgotten. Don’t even get me started about all the screenshots that I accidentally take, mostly from my lockscreen. You might be able, at best, to search for text within the image. It’s more likely you’ll have to scroll until you find the text again.

The key to making screenshots useful is to determine what is actually in them.

To make screenshots useful, you must first determine what is actually in them. It’s not difficult to see that optical character recognition has been a reliable way of detecting text on a webpage for a long time. AI models go one step further. You can search by title or “movies” and find all your digital photos of posters, Fandango recommendations, TikTok results, etc. Shenaz Zak, a Google product manager and member of the Pixel Screenshots team, says that “we use an OCR” model. “We use an entity-detection and Gemini to understand the context of the screen.”

There’s more to a screenshot that just the text. The right AI model will be able tell by the green color that it is a screenshot from WhatsApp. It should be able identify a website’s header logo, or understand when you are saving a Spotify track name, a Yelp handyman reviews, or an Amazon listing. With this information, you can use a screenshot application to automatically organize your images. Even that is only the beginning.

After everything I have described so far, we’ve created a very good application for looking at screenshots, but no one thinks it’s a good idea, because it’s just another thing to check – or forget to do. It gets much more interesting when your device, or app, can start using screenshots to help you remember what you captured. Or even use this information to get things done.

For example, in Nothing’s Essential Space app, the app can create reminders based off of stuff you save. You can be reminded of a concert that you want to attend if you take a screen shot. Pixel Screenshots takes the idea one step further: If you save a concert list, your Pixel phone will prompt you to play that band when you open Spotify the next time. If you take a screenshot of an ID card or boarding pass, you may be asked to add it to the Wallet application. Zack says that screenshots can be used as input for anything else.

Mike Choi is an independent developer who built an app called Camp to help him use his own screenshots. He began working on turning each screenshot into a card, with the important information stored alongside the image. “You have a screen shot, and there’s a bottom button that flips the card,” he says. It shows a map if the screenshot was of a location, or a preview if it is a song. The idea was that AI could generate the perfect UI on the fly, given an infinite number of different screenshots. It seems that every tech company is working on ways to use AI on your behalf. In this case, it’s just that you don’t need to write long instructions or chat back-and-forth with an assistant. You simply take a screenshot, and the system will do the rest. Deserti says, “You’re creating a knowledge database. Today that knowledge is contained in your gallery with nothing happening.” He’s eager to reach the point where you can screenshot a concert date and Essential Space will prompt you to purchase tickets when they become available. Making sense of screen shots isn’t easy

Making screenshots understandable isn’t always straightforward. Some things you will keep forever, such as an ID card that you may use often. Other items, such as a concert poster, or a parking permit, have a very short shelf life. How can an app tell the difference between a parking pass that you use every day for work and one that you only used once at the Airport and will never need again? Some of the screenshots I have on my phone came from WhatsApp, while others were taken from Instagram memes that I wanted to share with friends. The camera roll of anyone should never be used against them. Many screenshot apps want you to add notes or organize your own images to give the system more information. It’s difficult to do this without destroying what makes screenshots so easy and seamless in the first place.

To solve this problem and make screenshots more useful, you can collect additional context from your device. Companies like Google and Nothing are at an advantage because they manufacture the device. They can see what happens when you take a screen shot. If you take a screenshot of your web browser they can also save the link that you were viewing. They can also track your location, note the weather and time. This can be useful at times, but it’s also nonsense. The more data these apps collect, the greater the risk of them running into the same noise problems that screenshots were designed to solve.

The input system is working. We all take screenshots all the time. They are a great way to mark important information. The hardest part of building a great AI assistant is getting access to relevant, personalized data. The future of computing will be multimodal, with cameras, microphones and sensors of every kind. The first best way to utilize AI could be by taking a screenshot at a given time.

www.aiobserver.co

NO COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Exit mobile version