First Local LLM

2024/11/25 15:142024/12/10 17:46

My name is Aoyama and I work in the Service Reliability Group (SRG) of the Media Headquarters.

#SRG(Service Reliability Group) mainly provides cross-sectional support for the infrastructure of our media services, improving existing services, launching new ones, contributing to OSS, etc.

This article isCyberAgent Group SRE Advent Calendar 2024This is the 8th article of the series. What is a local LLM? I've summarized the questions I had, starting from that. I hope this will be helpful for those who are considering introducing a local LLM.

What is a Local LLM?Local LLM Use Cases Model Type What is Hallucination in LLM?What is RAG (Retrieval-Augmented Generation)?Trying out local LLM (Microcosm beta version)Conclusion

What is a Local LLM?

A local LLM is a large-scale language model (LLM) that can be run in a local environment, such as on your own computer or server. Normally, LLMs like ChatGPT run on the cloud, but a local LLM has the following features:

Run locally You can download the model and run it on your own PC or server. You can also use it without an internet connection.

Privacy protection: No data is sent to the cloud, so you can rest assured when handling confidential or personal information.

Customizable: You can fine-tune the model to suit your needs or train it on your specific data.

Cost savings: Since you don't use commercial cloud services, you can avoid pay-per-use charges. However, if you need high-performance hardware, you will have to pay for it.

Local LLM Use Cases

AI application development for personal projects

Analysis and tool building using data within a company

Use of AI in offline environments

Although implementation requires some technical knowledge, Local LLM is a viable option for those who want to leverage advanced AI technology while respecting privacy.

Model Type

Some of the most common models that can be used to build LLMs locally include:

LLaMA（Large Language Model Meta AI）: This model was developed by Meta (formerly Facebook), and the latest AI model, "Llama3," was announced in April 2024. It is recommended for use in research.

BERT（Bidirectional Encoder Representations from Transformers）: A deep learning model for natural language processing developed by Google. BERT can perform language understanding tasks and understand the meaning and context of a sentence.

BLOOM:

It is an open source multilingual large-scale language model developed by BigScience, a collaborative AI research workshop.

phi-3: An open-source, high-performance SLM (small language model) developed by Microsoft. Phi-3 is said to perform significantly better than language models of the same parameter size.

These models are easily available through, for example, Hugging Face's Transformers library, and are relatively easy to deploy and customize locally.

🤗 Transformers

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

https://huggingface.co/docs/transformers/ja/index

What is Hallucination in LLM?

Hallucination | Glossary | Nomura Research Institute (NRI)

This is the official website of Nomura Research Institute (NRI). It contains NRI's recommendations, surveys and reports, products and services, IT solution examples, IR information, recruitment information, CSR information, and more.

https://www.nri.com/jp/knowledge/glossary/lst/ha/hallucination

A problem in LLMs (large language models) is "hallucination," which refers to the phenomenon where the model generates facts or information that do not actually exist, meaning that the model gives the wrong answer with high confidence for a given input.

For example, if you ask microsoft/Phi-3-mini-4k-instruct-gguf about the recipe for pain d'épices, a French Christmas dessert, you'll get the following response:

☆ Actual recipe for Pain d'Epice

Spice-scented Pain d'Epice | Recipe | Tomizawa Shoten

You can purchase the ingredients for [Spicy Pain d'Epice] at the Tomizawa Shoten online shop (mail order) or at our directly managed stores. We also have many free recipes available. We bring you the joy of cooking with reliable quality and reasonable prices.

https://tomiz.com/recipe/pro/detail/20171121162352

What is RAG (Retrieval-Augmented Generation)?

There is a technique called RAG (Retrieval-Augmented Generation) that can be used to avoid this hallucination.

This refers to an approach that combines generative AI (specifically large-scale language models, LLMs) with information retrieval capabilities. RAGs allow a model to obtain information from external knowledge sources (e.g. databases, documents, web pages, etc.) and use that information to generate responses or sentences.

Trying out local LLM (Microcosm beta version)

I decided to try out the beta version of the local AI application "Microcosm," which was just released in November 2024.

Free local AI application "Microcosm" beta version now available - Number One Solutions | Generative AI system development company

Number One Solutions Co., Ltd. (Headquarters: Meguro-ku, Tokyo, CEO: Tetsuo Menrai, hereinafter referred to as "our company"), a company engaged in generative AI and Web3 development, has released a desktop application called "Microcos […]

https://no1s.biz/news/press/7176/

The manual is comprehensive in Japanese, so you can complete everything from installation to model configuration without any trouble.

Microcosm β version user manual - Number One Solutions | Generative AI system development company

Updated: November 6, 2024 ~Table of Contents~ 1. Installing the desktop app ~ Steps to start using it 1-1. Installing and activating the app 1-2. Installing the local model 1-3. Setting prompts, etc. 1 […]

https://no1s.biz/microcosm/user-manual/

Selecting the model microsoft/Phi-3-mini-4k-instruct-gguf and asking a question about "Pandepis" will result in "Hallucination."

To solve this problem, you can register a RAG. You can register data from the menu on the right side.

You can also select a directory, so this time I will register some PDFs of recipes from the web and a PDF of Wikipedia content.

This is the resulting RAG data after import.

You can delete, edit, or add new data imported from the screen.

Apply the RAG from the side menu on the home screen.

When I ask about the same "pan de pis," I get a lot more correct answers.

Conclusion

We took a quick look at local LLM and RAG, which can be used in situations where you want to use LLM for business purposes but do not want to send data to the cloud.

We would like to express our sincere gratitude to Number One Solutions Co., Ltd., who kindly agreed to allow the use of images from the beta version of "Microcosm" and to link to the manual in creating this article.

SRG is looking for people to work with us. If you are interested, please contact us here.

Recruitment information - CyberAgent SRG #ca_srg

About SRG SRG (Service Reliability Group) is working to improve reliability by promoting the introduction of SREs to the media business as a cross-sectional SRE under the vision of "improving reliability across the media business." The work is centered around the following three pillars: Consolidating and deploying the technical know-how of each business

https://ca-srg.dev/careers