Building a Local Question-Answering System with LangChain and Llama3.2:

Henri Johnson

CEO and Inventor/Founder of FlightScope

Published Nov 15, 2024

Artificial Intelligence is revolutionizing how we interact with data. Today, I’ll show you how to use LangChain and Llama3.2:1b to build a local, privacy-friendly question-answering system tailored for FlightScope, a pioneer in golf technology.

This system leverages Retrieval-Augmented Generation (RAG) to deliver accurate, engaging, and personalized answers about FlightScope’s innovative products, such as the Mevo+ launch monitor.

Let’s dive into this step-by-step guide and explore how to build a RAG App using a local opensource LLM : retrieval-chains.js!

Why Llama3.2:1b ?

Before we get started, let’s explore why Llama3.2:1b is perfect for this task:

Massive Context Length (131,072 Tokens) This allows the model to process vast amounts of input, making it ideal for handling complex, multi-document queries.
High-Dimensional Embeddings (2048 Dimensions) These embeddings capture detailed textual semantics, ensuring precise retrieval of relevant information.
Efficient Feed-Forward Length (8192) The model’s ability to handle complex computations ensures fast and accurate responses, even in nuanced scenarios.

These features make Llama3.2:1b an exceptional choice for building efficient, local RAG systems.

Here is a diagram that shows the various steps for setting up and building the RAG App:

RAG App using llama3.2:1b on Ollama locally

Section A: Installing and Configuring Ollama

Install Ollama

Ollama makes it easy to run open-source models locally. Start by installing it:

( or install directly from ollama.ai site for your OS )

Pull the Required Models

Once installed, download the models we’ll use:

llama3.2:1b for LLM use

nomic-embed-text for Embeddings in our Vectorstore

Start the Ollama Server

Run the server on port 11434:

This will make the models available locally for our app.

Section B: Setting Up the Environment

Using your favorite code editor, go ahead and start a new project. ( I use VS Code )

Create a .env file in your project’s root directory. This file will store environment variables for easy configuration.

Here’s what your .env file should look like:

Section C: Installing Dependencies

For this project, I used pnpm, a fast and efficient package manager.

ps. However, you can also use npm or yarn if you prefer.

Install pnpm (if not already installed):

Install the Required Packages:

Run the following command to install the necessary dependencies:

Section D: Full Code for our App: retrieval-chains.js

Here is the complete, chronologically structured code for retrieval-chains.js.

3. Inputs and Configuration

We’ll start by loading environment variables and importing necessary packages:

2. Setting Up the LLM

Next, set up the Llama3.2 LLM and configure a friendly, engaging prompt:

( ps. Remember to use string literal backticks ` in the prompt template )

3. Web Scraping

Load content from FlightScope’s website ( https://www.flightscope.com ) to create a knowledge base:

4. Splitting Text and Generating Embeddings

Split the documents into smaller chunks and generate vector embeddings:

5. Storing Data in Vector Store and Setting Up Retriever

Store the generated embeddings in a vector store and set up a retriever:

6. Creating the RAG Chain

Combine the LLM with retrieved documents for generating contextual answers:

7. Querying the System (Retrieve information from the Vector DB)

Finally, test the system by querying it about FlightScope Mevo+:

Step 5: Running the App

Save your code as retrieval-chains.js and run the app with the following command:

Sample Response

Here’s a real response generated by the app:

Get the code from my GitHub Repository:

To access the full code and instructions, visit the GitHub repository:

👉 Local-RAG-Ollama-llama3.2-1b

Conclusion

Congratulations! You’ve built a fully functional, local question-answering system using LangChain and Llama3.2:1b. This system not only delivers accurate answers but also engages users with friendly and interactive responses according the the System Prompt that we created.

Whether for customer support, product inquiries, or personal projects, retrieval-chains.js showcases the potential of modern AI solutions.

Written by: Henri Johnson [M-Eng (Electronic)] Founder & CEO, FlightScope

#LangChain #Llama3.2 #Ollama #FlightScope #OpenSourceAI #RAGApplications

Building a Local Question-Answering System with LangChain and Llama3.2:

Henri Johnson

CEO and Inventor/Founder of FlightScope

Why Llama3.2:1b ?

Section A: Installing and Configuring Ollama

Install Ollama

Pull the Required Models

Start the Ollama Server

Section B: Setting Up the Environment

Section C: Installing Dependencies

Install pnpm (if not already installed):

Install the Required Packages:

Section D: Full Code for our App: retrieval-chains.js

3. Inputs and Configuration

2. Setting Up the LLM

3. Web Scraping

4. Splitting Text and Generating Embeddings

5. Storing Data in Vector Store and Setting Up Retriever

6. Creating the RAG Chain

7. Querying the System (Retrieve information from the Vector DB)

Step 5: Running the App

Sample Response

Get the code from my GitHub Repository:

Conclusion

More articles by this author

Others also viewed

Dataverse Hygiene: A Field Guide to Cleaning Up Your Field Names

Monitor Your Web Data Like a Pro with this Open-Source tool :: Spidermon

Six Secret SPARQL Ninja Tricks

What is Not Search?

ScrapeGraphAI Reviews | Instantly Extract and Organize Web Data

The Emergence Of Web 3.0 Part 1: Introduction To Semantic Web

Array - Linked List - Binary Search on Sorted Array

The latest Ably releases: May 2025 edition

Celebrating 29 Years of Being on the Web

The500poundMVP newsletter

Explore topics

Why Llama3.2:1b ?

Section A: Installing and Configuring Ollama

Install Ollama

Pull the Required Models

Start the Ollama Server

Section B: Setting Up the Environment

Section C: Installing Dependencies

Install pnpm (if not already installed):

Install the Required Packages:

Section D: Full Code for our App: retrieval-chains.js

3. Inputs and Configuration

2. Setting Up the LLM

3. Web Scraping

4. Splitting Text and Generating Embeddings

5. Storing Data in Vector Store and Setting Up Retriever

6. Creating the RAG Chain

7. Querying the System (Retrieve information from the Vector DB)

Step 5: Running the App

Sample Response

Get the code from my GitHub Repository:

Conclusion

Optimize Cursor Workflow

May 4, 2025

The Great Communicator: Ronald Reagan's Legacy of Oratory and Its Lasting Impact : [Research inspired by the movie "Reagan"]

Sep 3, 2024

Is Brushing and Flossing Enough to Ward Off Hidden Periodontal Infections?

Mar 30, 2024

The Rising Tide of Colon Cancer Among Young Adults: A Closer Look at the Role of Alcohol

Mar 23, 2024

Schrödinger’s Cat and the Quantum Conundrum: Unveiling the Mysteries of Quantum Mechanics

Mar 7, 2024

The Link Between Myasthenia Gravis(MG) and Molecular Mimicry in Periodontal Disease

Dec 21, 2023

The Role of Molecular Mimicry in Periodontal Disease and Autoimmune Responses

Dec 19, 2023

INVENTORS UNDER ATTACK

Oct 16, 2019

Others also viewed

Dataverse Hygiene: A Field Guide to Cleaning Up Your Field Names

Monitor Your Web Data Like a Pro with this Open-Source tool :: Spidermon

Six Secret SPARQL Ninja Tricks

What is Not Search?

ScrapeGraphAI Reviews | Instantly Extract and Organize Web Data

The Emergence Of Web 3.0 Part 1: Introduction To Semantic Web

Array - Linked List - Binary Search on Sorted Array

The latest Ably releases: May 2025 edition

Celebrating 29 Years of Being on the Web

The500poundMVP newsletter

Explore topics