Quickstart

LocalAI is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that’s compatible with OpenAI API specifications for local inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families and architectures.

LocalAI is available as a container image and binary, compatible with various container engines like Docker, Podman, and Kubernetes. Container images are published on quay.io and Docker Hub. Binaries can be downloaded from GitHub.

Prerequisites

Before you begin, ensure you have a container engine installed if you are not using the binaries. Suitable options include Docker or Podman. For installation instructions, refer to the following guides:

💡

Hardware Requirements: The hardware requirements for LocalAI vary based on the model size and quantization method used. For performance benchmarks with different backends, such as llama.cpp, visit this link. The rwkv backend is noted for its lower resource consumption.

Running LocalAI with All-in-One (AIO) Images

Do you have already a model file? Skip to Run models manually or Run other models to use an already-configured model.

LocalAI’s All-in-One (AIO) images are pre-configured with a set of models and backends to fully leverage almost all the LocalAI featureset.

These images are available for both CPU and GPU environments. The AIO images are designed to be easy to use and requires no configuration.

It suggested to use the AIO images if you don’t want to configure the models to run on LocalAI. If you want to run specific models, you can use the manual method.

The AIO Images comes pre-configured with the following features:

Text to Speech (TTS)
Speech to Text
Function calling
Large Language Models (LLM) for text generation
Image generation
Embedding server

Start the image with Docker:

  docker run -p 8080:8080 --name local-ai -ti localai/localai:latest-aio-cpu
# For Nvidia GPUs:
# docker run -p 8080:8080 --gpus all --name local-ai -ti localai/localai:latest-aio-gpu-cuda-11
# docker run -p 8080:8080 --gpus all --name local-ai -ti localai/localai:latest-aio-gpu-cuda-12

Or with a docker-compose file:

  version: "3.9"
services:
  api:
    image: localai/localai:v2.11.0-aio-cpu
    # For Nvidia GPUs decomment one of the following (cuda11 or cuda12):
    # image: localai/localai:v2.11.0-aio-gpu-cuda-11
    # image: localai/localai:v2.11.0-aio-gpu-cuda-12
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 120m
      retries: 120
    ports:
      - 8080:8080
    environment:
      - DEBUG=true
      # ...
    volumes:
      - ./models:/build/models:cached
    # decomment the following piece if running with Nvidia GPUs
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: nvidia
    #           count: 1
    #           capabilities: [gpu]

For a list of all the container-images available, see Container images. To learn more about All-in-one images instead, see All-in-one Images.

What’s next?

Explore further resources and community contributions:

Edit this page

Last updated 25 Mar 2024, 22:04 +0100 . history

Overview

What is LocalAI?

Run other Models

Quickstart

Prerequisites link

Running LocalAI with All-in-One (AIO) Images link

What’s next? link

Prerequisites

Running LocalAI with All-in-One (AIO) Images

What’s next?