AI Curator

The Data OS for LLM Fine-Tuning

View on GitHub MIT LicensePrivacy-FirstLocal SQLite

Capture Everything. Curate with Precision. Train with Confidence.

AI Curator is the data operating system for fine-tuning. It connects to any data source — your IDE, Slack, logs, OpenWebUI conversations — and transforms raw data into training-ready datasets. Built for professionals who need robust data pipelines, but simple enough for beginners to master in minutes.

Universal Capture

Real-time capture from any tool via HTTP API. Slack, logs, IDEs, custom scripts — if it can make an HTTP request, it can stream to AI Curator.

Quality Control

Review, rate, approve or reject samples. Star ratings, categories, tags, and duplicate detection ensure your dataset is production-ready.

Export Ready

Export to 7 formats: Alpaca, MLX, JSONL, CSV, Unsloth, TRL, and more. Smart splitting, filtering, and stratification included.

Capture from Any Source

Slack

Capture conversations from Slack channels. Export team discussions, support tickets, or internal knowledge sharing as training data.

curl -X POST http://localhost:3333/api/capture -d '{"source":"slack","records":[...]}'

Application Logs

Process logs in real-time. Extract error-resolution pairs, user interactions, or system events as training samples.

tail -f /var/log/app.log | processor | curl -X POST ...

OpenWebUI

Official OpenWebUI plugin available. Automatically capture conversations from your self-hosted AI chat interface.

Plugin available in OpenWebUI community plugins

IDEs & Development

Capture code explanations, refactoring decisions, or debug sessions. Build coding assistants from your actual development workflow.

VS Code Extension → http://localhost:3333/api/capture

Any HTTP Source

The Live Capture API accepts data from any tool that can make an HTTP POST request. Custom scripts, internal tools, web apps — if it can send JSON, it can stream to AI Curator.

Three Ways to Work

01

Web UI

Point, click, curate. Perfect for manual quality control and visual dataset exploration.

  • • Drag & drop import
  • • Card-based sample review
  • • One-click export
  • • Visual dashboards
02

CLI

Automate everything. Built for professionals, CI/CD pipelines, and bulk operations.

  • • Bulk import/export
  • • HuggingFace search & download
  • • Advanced filtering & splitting
  • • Scriptable workflows
03

Live Capture API

Our most powerful feature. Stream data in real-time from any source via HTTP.

  • • Real-time HTTP ingestion
  • • Webhook integrations
  • • Log processors
  • • Custom script support

EdukaAI Starter Pack Included

AI Curator comes with 75 premium samples ready to train immediately. Plus, download the full 400-sample EdukaAI Starter Pack to jumpstart your journey.

Get the Starter Pack →
75
Included Samples
400
Starter Pack

Start Capturing Today

Install AI Curator and start building your data pipeline in minutes. 100% local, privacy-first.