This article presents a systematic study called Design2Code, which examines the possibility of automating front-end development by converting visual designs into code using Generative AI technologies. It highlights the creation of a benchmark for evaluating the performance of multimodal Large Language Models (LLMs) and notes that GPT-4V, specifically, shows promise in potentially replacing manual front-end development in some cases.

Main Points

Advancements in Generative AI for Design2Code

Generative AI has significantly advanced, leading to a new paradigm in front-end development where visual designs can be directly converted into code, a task known as Design2Code.

Benchmarking Design2Code Capabilities

A benchmark of 484 real-world webpages was curated, along with automatic evaluation metrics, to assess the capabilities of multimodal LLMs in converting designs to code.

GPT-4V Outperforms in Automating Front-End Engineering

GPT-4V and Gemini Vision Pro were tested, with GPT-4V emerging as the superior model for this task, showing potential to automate front-end engineering.

Insights

GPT-4V outperforms in generating webpages, surpassing original webpages in 64% of cases.

Both human evaluation and automatic metrics show that GPT-4V is the clear winner on this task, where annotators think GPT-4V generated webpages can replace the original reference webpages in 49% cases in terms of visual appearance and content; and perhaps surprisingly, in 64% cases GPT-4V generated webpages are considered better than even the original reference webpages.

Human and automatic evaluations confirm the superiority of GPT-4V in webpage generation

This assessment is supported by both human evaluation and automatic metrics. The automatic evaluation is based on high-level visual similarity (CLIP) and low-level element matching (block-match, text, position, color).

Images

URL

https://salt-nlp.github.io/Design2Code/
Hi Josh Adams, I am your personal AI. What would you like to ask about your notes?