Local Generative AI Models

Privacy-first LLMs for text transformation, summarization, and AI commands running locally.

Phi-4 Mini 3.8B

LLM 2.2 GB

psychology_alt

Latest Microsoft Phi-4 Mini ONNX export. State-of-the-art small model. ~2.3GB.

memory

VRAM 4GB

speed

CPU Intel Core i5 / Apple M2 / Snapdragon X Elite

Secure Local Inference lock

Phi-3 Mini 4K Instruct

LLM 2.7 GB

psychology_alt

Microsoft Phi-3 Mini 4K Instruct. Efficient small model with strong reasoning. ~2.1GB.

memory

VRAM 4GB

speed

CPU Intel Core i5 / Apple M2 / Snapdragon X Elite

Secure Local Inference lock

Phi-3 Small 8K Instruct

LLM 5.0 GB

psychology_alt

Microsoft Phi-3 Small 8K Instruct. Strong mid-range model. ~4.2GB.

memory

VRAM 8GB

speed

CPU Intel Core i7 / Apple M3 / Snapdragon X Elite

Secure Local Inference lock

Yi 1.5 6B Chat

LLM 3.8 GB

cyclone

High-performance medium model by 01.AI. Balanced speed and intelligence. ~3.8GB.

memory

VRAM 8GB

speed

CPU Intel Core i7 / Apple M3 / Snapdragon X Elite

Secure Local Inference lock

Llama-3 8B Instruct (FP16)

LLM 16.0 GB

psychology

High-fidelity FP16 export of Llama-3 8B. Requires significant VRAM. ~16GB.

memory

VRAM 24GB

speed

CPU Intel Core i9 / Apple M4 / Snapdragon X Elite

Secure Local Inference lock

DeepSeek-R1 Distill Qwen 1.5B

LLM 1.0 GB

temp_preferences_custom

Efficient small distilled model by DeepSeek. INT4 CPU.

memory

VRAM 2GB

speed

CPU Intel Core i3 / Apple M1 / Snapdragon 8cx

Secure Local Inference lock

DeepSeek-R1 Distill Qwen 7B

LLM 6.7 GB

temp_preferences_custom

Powerful distilled model by DeepSeek. INT4 CPU. ~6.7GB.

memory

VRAM 8GB

speed

CPU Intel Core i7 / Apple M3 / Snapdragon X Elite

Secure Local Inference lock

Llama 3.2 1B Instruct

LLM 1.9 GB

psychology_alt

Ultra‑lightweight Meta Llama‑3.2 1B Instruct ONNX model. Fast and efficient. ~1.1GB.

memory

VRAM 2GB

speed

CPU Intel Core i3 / Apple M1 / Snapdragon 8cx

Secure Local Inference lock