Conceptual Design Generation Using Large Language Models
The following characteristics of the solutions were explored:
Feasibility: rated on an anchored scale from 0 (the technology does not exist to create the solution) to 2 (the solution can be implemented in the manner suggested).
Novelty: rated on an anchored scale from 0 (the concept is copied from a common and/or pre-existing solution) to 2 (the solution is new and unique). Of note, “novelty” is considered to be the uniqueness of the solution with respect to the existing design space and with respect to the entire generated solution set.
Usefulness: rated on an anchored scale from 0 (completely off-topic and not related to the solution at all) to 2 (the solution is helpful given the context of the prompt).
Two experts, both specializing in design theory and methodology, were trained to perform all ratings for both GPT-3 and crowdsourced design solutions. Consistency was assessed over a subsample of the data using the Cohen’s Kappa test, and a subsample of 20 solutions were evaluated for each design prompt and mode of generation (for a total of 120 designs) to assess for a fair-moderate reliability of correlation between the two evaluators.
with the use of few-shot learning, LLMs are capable of generating design solutions that are similar to crowdsourced solutions, but these modifications lead to a decrease in the diversity of solutions that LLMs are capable of generating. Expert evaluations reveal that LLMs generate solutions that are more feasible and useful than crowdsourced solutions but less novel. This paper provides a foundation for future work to explore the use of LLMs in providing conceptual design ideas to designers.