
Skyvern is a tool that automates browser-based workflows using Language Models (LLMs) and Computer Vision. It offers a novel approach to automation by not relying on pre-defined selectors like XPath, but instead using visual elements and LLMs to navigate and interact with websites in real-time. This makes Skyvern adaptable to new websites and resistant to layout changes, enhancing the reliability of browser-based automation.
Main Points
Introduction to Skyvern
Skyvern automates browser-based workflows using LLMs and computer vision, providing a simple API endpoint to fully automate manual workflows.
How Skyvern operates
Skyvern uses computer vision and LLMs to parse items in the viewport in real-time, creating a plan for interaction and interacting with them.
About Skyvern Cloud
Skyvern offers a managed cloud version that allows running multiple instances in parallel with added features like anti-bot detection and CAPTCHA solving.
Insights
Skyvern can operate on websites it’s never seen before.
Skyvern is able to map visual elements to actions necessary to complete a workflow, without any customized code.
Skyvern is resistant to website layout changes.
There are no pre-determined XPaths or other selectors the system is looking for while trying to navigate.
Skyvern leverages LLMs to cover complex situations.
Examples include inferring if a user was eligible to drive at 18 based on receiving their license at 16 and understanding that slightly different sized products at different stores are the same.