Ponkotsu LLM
● LiveA lightweight language model running on a Raspberry Pi. It's no good at hard problems, but for small talk and a bit of writing help, it does its honest best. Responses stream back token by token.
- Endpoint
POST /api/chat- I/O
- text->text · streaming
Demo
⌘/Ctrl + Enter
>
How to use
Ollama-compatible. Use the official ollama
package and just point the host at ponkotsu-lab.net.
Install
npm install ollama
Try it
import { Ollama } from "ollama";
const ollama = new Ollama({ host: "https://ponkotsu-lab.net" });
const res = await ollama.chat({
model: "gemma4:e2b",
messages: [{ role: "user", content: "Write a short story" }],
stream: true,
});
for await (const part of res) {
process.stdout.write(part.message.content); // print each token as it arrives
}
Run it and the text streams into your terminal piece by piece.
Notes
- The model is
gemma4:e2b. - In the browser, use
import { Ollama } from "ollama/browser". - The thinking phase is disabled server-side, so the answer starts right away.
Limitations (the ponkotsu bits)
- Input and output together are capped at ~1024 tokens (older input is trimmed if you go over).
- Being underpowered, long text and complex reasoning are not its strength.
- Under load you may be rate-limited and put in a queue.
- Open to everyone, but there is a per-IP usage cap.
- Runs on a lightweight model (powered by Ollama / gemma).