GitHub - Skyvern-AI/skyvern: Automate browser-based workflows with LLMs and Computer Vision

Tags

see all

Automate browser-based workflows with LLMs and Computer Vision - Skyvern-AI/skyvern

Skyvern is a tool that automates browser-based workflows using Language Models (LLMs) and Computer Vision. It offers a novel approach to automation by not relying on pre-defined selectors like XPath, but instead using visual elements and LLMs to navigate and interact with websites in real-time. This makes Skyvern adaptable to new websites and resistant to layout changes, enhancing the reliability of browser-based automation.

Main Points

Introduction to Skyvern

Skyvern automates browser-based workflows using LLMs and computer vision, providing a simple API endpoint to fully automate manual workflows.

How Skyvern operates

Skyvern uses computer vision and LLMs to parse items in the viewport in real-time, creating a plan for interaction and interacting with them.

About Skyvern Cloud

Skyvern offers a managed cloud version that allows running multiple instances in parallel with added features like anti-bot detection and CAPTCHA solving.

Insights

Skyvern can operate on websites it’s never seen before.

Skyvern is able to map visual elements to actions necessary to complete a workflow, without any customized code.

Skyvern is resistant to website layout changes.

There are no pre-determined XPaths or other selectors the system is looking for while trying to navigate.

Skyvern leverages LLMs to cover complex situations.

Examples include inferring if a user was eligible to drive at 18 based on receiving their license at 16 and understanding that slightly different sized products at different stores are the same.

Images

URL

https://github.com/Skyvern-AI/skyvern