First Local LLM
My name is Aoyama and I work in the Service Reliability Group (SRG) of the Media Headquarters.
#SRG(Service Reliability Group) is a group that mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, and contributing to OSS.
This article isCyberAgent Group SRE Advent Calendar 2024This is the 8th day's article.
I've summarized the questions I had, starting with "What is a Local LLM?" I hope this will be helpful for those considering adopting a Local LLM.
What is a Local LLM?Local LLM Use CasesModel TypeWhat is Hallucination in LLM?What is RAG (Retrieval-Augmented Generation)?Trying out local LLM (Microcosm beta version)Conclusion
What is a Local LLM?
A local LLM is a large-scale language model (LLM) that can be run in a local environment, such as on your own computer or server. While LLMs like ChatGPT usually run on the cloud, local LLMs have the following features:
- Run locally You can download the model and run it on your PC or server. You can even use it without an internet connection.
- Privacy protection: No data is sent to the cloud, so you can rest assured when handling confidential or personal information.
- Customizable: You can fine-tune the model to suit your needs and train it on your specific data.
- Cost savings: By not using commercial cloud services, you avoid pay-per-use charges. However, if you require high-performance hardware, you will incur costs.
Local LLM Use Cases
- AI application development for personal projects
- Analysis and tool building using data within the company
- Using AI in offline environments
While implementation requires some technical knowledge, Local LLM is a compelling option for those who want to utilize advanced AI technology while respecting privacy.
Model Type
Some of the most common models that can be used to build LLMs (large-scale language models) locally include:
- LLaMA(Large Language Model Meta AI): This model was developed by Meta (formerly Facebook), and the latest AI model, "Llama3," was released in April 2024. It is recommended for research purposes.
- BERT(Bidirectional Encoder Representations from Transformers): BERT is a deep learning model for natural language processing developed by Google. It can perform language understanding tasks and understand the meaning and context of sentences.
- BLOOM:
It is an open-source multilingual large-scale language model developed by BigScience, a collaborative AI research workshop.
- phi-3: This is an open-source, high-performance small language model (SLM) developed by Microsoft. Phi-3 is said to perform significantly better than language models of the same parameter size.
These models are easily available through libraries such as Hugging Face's Transformers library, and are relatively easy to deploy and customize locally.
What is Hallucination in LLM?
One problem with LLMs (large language models) is "hallucination," which refers to the phenomenon where the model generates facts or information that do not actually exist, meaning that the model provides an incorrect answer with high confidence for a given input.
For example, if you ask microsoft/Phi-3-mini-4k-instruct-gguf about the recipe for pain d'epice, a French Christmas treat, you'll get the following response:

☆ Actual recipe example of pain d'epice
What is RAG (Retrieval-Augmented Generation)?
One method to avoid this "hallucination" is called RAG (Retrieval-Augmented Generation).
This refers to an approach that combines generative AI (specifically large-scale language models, or LLMs) with information retrieval capabilities. RAGs allow models to obtain information from external knowledge sources (e.g., databases, documents, web pages, etc.) and use that information to generate responses or sentences.
Trying out local LLM (Microcosm beta version)
I decided to try out the beta version of the local AI application "Microcosm," which was just released in November 2024.
The comprehensive Japanese manual means you can complete everything from installation to model configuration without any confusion.
If you select the model microsoft/Phi-3-mini-4k-instruct-gguf and ask a question about "Pandepis", you will experience "hallucination".

To solve this problem, you can register a RAG. You can register data from the menu on the right side.
You can also select a directory, so this time we will register some PDF files of recipes from the web and a PDF file of Wikipedia content.

This is the RAG data that was created after importing.
You can delete, edit, or add new imported data from the screen.

Apply RAG from the side menu on the home screen.

When I ask about the same "pan de pis," I get a much more accurate answer.

Conclusion
We took a quick look at local LLM and RAG, which can be used in situations where you want to use LLM for business purposes but do not want to send data to the cloud.
We would like to express our sincere gratitude to Number One Solutions Co., Ltd., who kindly agreed to allow the use of images from the beta version of "Microcosm" and to provide a link to the manual in creating this article.
SRG is looking for people to work with us.
If you're interested, please contact us here.