First Local LLM

My name is Aoyama and I work in the Service Reliability Group (SRG) of the Media Headquarters.
#SRG(Service Reliability Group) mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, contributing to OSS, etc.
 
This article isCyberAgent Group SRE Advent Calendar 2024This is the 8th article of the series. What is a local LLM? I've summarized the questions I had, starting from that. I hope this will be helpful for those who are considering introducing a local LLM.
 

What is a Local LLM?


A local LLM is a large-scale language model (LLM) that can be run in a local environment, such as on your own computer or server. Normally, LLMs like ChatGPT run on the cloud, but a local LLM has the following features:
  1. Run locally You can download the model and run it on your own PC or server. You can also use it without an internet connection.
  1. Privacy protection: No data is sent to the cloud, so you can rest assured when handling confidential or personal information.
  1. Customizable: You can fine-tune the model to suit your needs or train it on your specific data.
  1. Cost savings: Since you don't use commercial cloud services, you can avoid pay-per-use charges. However, if you need high-performance hardware, you will have to pay for it.
 

Local LLM Use Cases


  • AI application development for personal projects
  • Analysis and tool building using data within a company
  • Use of AI in offline environments
Although implementation requires some technical knowledge, Local LLM is a viable option for those who want to leverage advanced AI technology while respecting privacy.
 

Model Type


Some of the most common models that can be used to build LLMs locally include:
 
  1. LLaMA(Large Language Model Meta AI): This model was developed by Meta (formerly Facebook), and the latest AI model, "Llama3," was announced in April 2024. It is recommended for use in research.
  1. BERT(Bidirectional Encoder Representations from Transformers): A deep learning model for natural language processing developed by Google. BERT can perform language understanding tasks and understand the meaning and context of a sentence.
  1. BLOOM:
    1. It is an open source multilingual large-scale language model developed by BigScience, a collaborative AI research workshop.
  1. phi-3: An open-source, high-performance SLM (small language model) developed by Microsoft. Phi-3 is said to perform significantly better than language models of the same parameter size.
 
These models are easily available through, for example, Hugging Face's Transformers library, and are relatively easy to deploy and customize locally.
 

What is Hallucination in LLM?


 
 
A problem in LLMs (large language models) is "hallucination," which refers to the phenomenon where the model generates facts or information that do not actually exist, meaning that the model gives the wrong answer with high confidence for a given input.
 
For example, if you ask microsoft/Phi-3-mini-4k-instruct-gguf about the recipe for pain d'épices, a French Christmas dessert, you'll get the following response:
 
 
☆ Actual recipe for Pain d'Epice
 

What is RAG (Retrieval-Augmented Generation)?


There is a technique called RAG (Retrieval-Augmented Generation) that can be used to avoid this hallucination.
This refers to an approach that combines generative AI (specifically large-scale language models, LLMs) with information retrieval capabilities. RAGs allow a model to obtain information from external knowledge sources (e.g. databases, documents, web pages, etc.) and use that information to generate responses or sentences.
 

Trying out local LLM (Microcosm beta version)


I decided to try out the beta version of the local AI application "Microcosm," which was just released in November 2024.
 
The manual is comprehensive in Japanese, so you can complete everything from installation to model configuration without any trouble.
 
 
Selecting the model microsoft/Phi-3-mini-4k-instruct-gguf and asking a question about "Pandepis" will result in "Hallucination."
 
To solve this problem, you can register a RAG. You can register data from the menu on the right side.
You can also select a directory, so this time I will register some PDFs of recipes from the web and a PDF of Wikipedia content.
 
This is the resulting RAG data after import.
You can delete, edit, or add new data imported from the screen.
 
 
Apply the RAG from the side menu on the home screen.
 
When I ask about the same "pan de pis," I get a lot more correct answers.
 
 
 

Conclusion


We took a quick look at local LLM and RAG, which can be used in situations where you want to use LLM for business purposes but do not want to send data to the cloud.
 
We would like to express our sincere gratitude to Number One Solutions Co., Ltd., who kindly agreed to allow the use of images from the beta version of "Microcosm" and to link to the manual in creating this article.
 
SRG is looking for people to work with us. If you are interested, please contact us here.