Extreme Multi-Label Skill Extraction Training using Large Language Models
“We use an LLM to generate training data for skill extraction, grounded in the ESCO ontology. Based on this synthetic data, we optimize a model using contrastive learning to represent skill names and corresponding sentences close together in the same space. Our key contribution is a novel end-to-end approach to training a skill extraction system, consisting of the cost-effective synthetic data generation and the contrastive learning procedure alongside an effective augmentation procedure.”