Ponkotsu LLM

● Live

A lightweight language model running on a Raspberry Pi. It's no good at hard problems, but for small talk and a bit of writing help, it does its honest best. Responses stream back token by token.

Endpoint: POST /api/chat
I/O: text->text · streaming

Demo

chat.demo

⌘/Ctrl + Enter

How to use

Ollama-compatible. Use the official ollama package and just point the host at ponkotsu-lab.net.

Install

npm install ollama

Try it

import { Ollama } from "ollama";

const ollama = new Ollama({ host: "https://ponkotsu-lab.net" });

const res = await ollama.chat({
  model: "gemma4:e2b",
  messages: [{ role: "user", content: "Write a short story" }],
  stream: true,
});

for await (const part of res) {
  process.stdout.write(part.message.content); // print each token as it arrives
}

Run it and the text streams into your terminal piece by piece.

Notes

The model is gemma4:e2b.
In the browser, use import { Ollama } from "ollama/browser".
The thinking phase is disabled server-side, so the answer starts right away.

Limitations (the ponkotsu bits)

Input and output together are capped at ~1024 tokens (older input is trimmed if you go over).
Being underpowered, long text and complex reasoning are not its strength.
Under load you may be rate-limited and put in a queue.
Open to everyone, but there is a per-IP usage cap.
Runs on a lightweight model (powered by Ollama / gemma).