Large Language Models For Social Networks: Applications, Challenges, And Solutions

Paper · Source
Social Media

We categorize LLM applications for social networks into three categories. First is knowledge tasks where users want to find new knowledge and information, such as search and question-answering. Second is entertainment tasks where users want to consume interesting content, such as getting entertaining notification content. Third is foundational tasks that need to be done to moderate and operate the social networks, such as content annotation and LLM monitoring. For each task, we share the challenges we found, solutions we developed and lessons we learned. To the best of our knowledge, this is the first comprehensive paper about developing LLM applications for social networks.

• Knowledge tasks: Tasks where users want to get new knowledge or new information, such as Search among social posts, Asking questions to other social network users;

• Engagement tasks: Tasks where we use LLMs to increase user engagement, e.g. creating interesting content for notifications;

• Foundation tasks: This includes tasks that bring impacts across many applications horizontally. For example, how to build to manage the API usage and LLM health, and how to build knowledge graph with LLMs belong to this type.

We show that combining RAG architecture with a fine-tuned model trained on domain-specific knowledge and context exhibits great potential in real-time question-answering applications. The LocalGPT system enriches the LLM’s parametric knowledge with our domain-specific knowledge base. In the system design, decoupling model serving from frequent vector store updates provides a foundation for the freshness of the retrieved knowledge source. Using the up-to-date retrieved knowledge in-context makes it possible to answer time-sensitive Q&A and dynamically adapt the ground truth answers from knowledge injection training. From the modeling perspective, we build an end-to-end training framework and demonstrate the effectiveness of knowledge injection training for downstream Q&A that requires local knowledge. Through knowledge injection training, the fine-tuned LM learns to emphasize the relevant and factual knowledge from the given context when performing in-context learning, which significantly increases generalization capability and reduces factual errors. We also find that relevant and up-to-date in-context documents have a bigger influence on retrieval augmented fine-tuned models than on pre-trained LLMs in terms of surfacing factual answers. Our empirical studies show that at inference time, when the retrieved source is not relevant and the injected knowledge does not contain relevant information either, the fine-tuned model benefits from its pre-trained knowledge.

Limitation and future work: Even though the fine-tuned model with retrieval generates better answers in domain specific questions and improves factual grounding based on our multi-dimensional evaluation, there are still challenges and limitations that remain unsolved. Retrieved knowledge provided in-context mitigates the issues of hallucination associated with large language models (LLMs). However, we still can not fully prevent hallucination, particularly for cases when no parametric knowledge exists and no relevant documents are retrieved. Also, fine-tuning compromises the safety alignment of pre-trained LLMs; thus adding blocklist or additional detection will be essential to filter out malicious instructions. There is a lot of opportunity for the LocalGPT system to be further improved. For instance, the latency introduced by OpenAI requests degrades the performance on real-time Q&A. We can explore open-sourced models for more customized training to reduce the dependency on closed-sourced LLMs. Secondly, the location targeting issue for the recommendation-related questions can be improved by increasing the recall of the retriever. Re-ranking and filtering before passing to an LLM for final response generation can also be an alternative in improving the accuracy of the retrieval system. From the fine-tuning perspective, scaling-up the diversity of the tasks and the size of the training example can potentially improve the generalization ability of the model. Also, more variety of preference data can be used to align the style and tone of the generated outputs with our user preferences. Last but not least, better prompting strategies on how to better incorporate LLMs with external knowledge in-context are critical to improving the model performance. We leave the exploration of these ideas to future investigations.

While LLM-generated summaries and extractions tend to create more informative and customized content, this does not necessarily translate into user engagement and growth. The experiments on push notifications show that utilizing AIGC to increase user interactions through simple in-context learning is not a shortcut, since the pre-trained LLMs are not optimized for user engagement. Without a reward model or specific model training to tailor user preferences into content generation, using in-context learning alone would be hard to directly drive engagement-related metrics. Based on our experiments, LLM-generated content is best suited for quick prototypes of new products. However, directly using LLM-generated content to improve metrics on products that have already undergone iterations

• A well-summarized body in a push notification might contain sufficient information so that a user does not need to open the notification to understand the whole picture. This finding is consistent with the observation in email subject lines described in the previous sections. More informative content does not necessarily lead to more user engagement.

• The summarized content transforms the first-person to third-person, using the phrases such as “neighbor is. . . ” or “neighbor asks”. Such tone changes might alienate users and make them have less desire to engage with the content.

We note that there were potential reasons why LLM-generated content was not as effective as we had hoped. First, our control template was improved over multiple iterations across many years. Second, the template was long, and we changed only one paragraph. Third, we don’t know how much context the recipient has about the neighborhood; if the recipient was not familiar with what is going on in their Nextdoor neighborhood, our summarization would not sound interesting to them.

Even though LLMs can scale up the customized content generation process, the generic content generated by LLMs might not capture each individual’s personal preferences. Without further training, it is still hard to use LLMs alone to generate user preferred invitation letters and boost user growth.

Our future work will explore ways to improve the effectiveness of LLM-generated content in increasing user engagement and growth. Particularly, we want to test whether generating a larger portion of the email with LLMs will lead to more positive outcomes, and whether asking the LLM to use certain writing styles (humorous, intriguing, poetic, etc.) would make a difference.