Interview

Fastino raises $17.5M from Khosla to train sub-billion-parameter AI models on consumer GPUs

May 9, 2025 with George Maloney

Key Points

  • Fastino raises $17.5M from Khosla Ventures to commercialize task-specific language models trained on consumer GPUs for under $100K, targeting enterprises drowning in large-model API costs.
  • The startup's sub-billion-parameter models sacrifice generality for precision on narrow tasks like text-to-SQL and JSON parsing, with inference fast enough for latency-sensitive workloads.
  • Founders George and Ash built Fastino after watching AI API bills balloon past headcount costs across portfolio companies, betting that frontier models won't solve the cost-latency-accuracy tradeoff for enterprises.
Fastino raises $17.5M from Khosla to train sub-billion-parameter AI models on consumer GPUs

Fastino, a startup training sub-billion-parameter AI models on consumer gaming GPUs, has raised $17.5 million from Khosla Ventures. The company was founded by George and his co-founder Ash, whose origin story runs directly through the problem Fastino is trying to solve: while running a Cursor-style developer agent in 2023, Ash found that LLM API costs exceeded his headcount costs. George, then investing as a GP after selling a previous company, saw the same pattern across his portfolio.

We launched TLMs — task specific language models. They're small, lightweight models that are really fast and built for AI developers. Being task specific really means we're more accurate on enterprise tasks than large models like OpenAI or Gemini. We spent less than $100K on GPUs to train our models but we're beating industry benchmarks for enterprise tasks... All of our models are far below a billion parameters.

The core thesis is that GPT-4 and Gemini are consumer products trained on trillions of data points to handle open-ended queries, and that makes them expensive and imprecise for the narrow, high-volume tasks enterprises actually run at scale. Fastino's answer is a family of models it calls Task-Specific Language Models (TLMs) — purpose-built, not fine-tuned or distilled from existing transformer models. George says the architecture is proprietary and undisclosed in detail, but all models sit below one billion parameters, were trained for under $100,000 in GPU costs, and inference in milliseconds.

Accuracy, George argues, improves as the task becomes more tightly scoped. The trade-off is explicit: these are not generalist models. The initial TLM lineup targets developer infrastructure use cases — text to JSON, text to SQL, agentic function calling, document parsing, and PII redaction. That last category is positioned at banks and insurance companies. There's also a profanity-detection model, which George describes as the team's internal favorite and the most entertaining red-teaming exercise.

Deployment is API-first, but the small parameter footprint opens on-premises and CPU-level inference as realistic options for customers who want to avoid round-trips to an external API. George frames this as a meaningful differentiator for latency-sensitive enterprise workloads.

The commercial argument rests on three pressure points that large LLMs don't resolve well for enterprises: cost, latency, and task accuracy. Whether Fastino's architecture can hold that edge as frontier models get cheaper and smaller remains the open question, but the $17.5 million from Khosla suggests at least one major investor thinks the wedge is real.

Every deal, every interview. 5 minutes.

TBPN Digest delivers summaries of the latest fundraises, interviews and tech news from TBPN, every weekday.