Myanmar NLP API

Production docs for segmentation and POS tagging.

Wordtoken exposes a focused API for Myanmar word segmentation, POS tagging, and batch inference. The service is live at wordtoken.ygn.app and fronts a Hugging Face-backed XLM-RoBERTa + BiLSTM + CRF model behind Caddy HTTPS.

Use this landing page for fast onboarding, and open the wiki for operational notes, deployment topology, and release guidance.

API Surface

What this service exposes

POST /api/v1/segment

Segment Myanmar text

Returns a list of token strings for a single input sentence.

curl -X POST https://wordtoken.ygn.app/api/v1/segment \
  -H 'content-type: application/json' \
  -d '{"text":"ကျွန်တော်သည်ကျောင်းသွားသည်"}'
POST /api/v1/pos

Joint segmentation + POS

Returns token/POS pairs in one call for downstream NLP apps.

curl -X POST https://wordtoken.ygn.app/api/v1/pos \
  -H 'content-type: application/json' \
  -d '{"text":"ကျွန်တော်သည်ကျောင်းသွားသည်"}'
POST /api/v1/batch

Batch processing

Processes multiple strings per request with batch-size controls.

curl -X POST https://wordtoken.ygn.app/api/v1/batch \
  -H 'content-type: application/json' \
  -d '{"texts":["မင်္ဂလာပါ","hello world"]}'
Quick Start

First requests

The API is HTTPS-first. Plain HTTP requests are redirected by Caddy, and the OpenAPI UIs stay available at /docs and /redoc.

curl https://wordtoken.ygn.app/health

curl -X POST https://wordtoken.ygn.app/api/v1/segment \
  -H 'content-type: application/json' \
  -d '{"text":"မြန်မာဘာသာသည်လှပသောဘာသာတစ်ခုဖြစ်သည်"}'
Sample Output

Expected response shape

{
  "input": "ကျွန်တော်သည်ကျောင်းသွားသည်",
  "words": [
    "ကျွန်တော်",
    "သည်",
    "ကျောင်း",
    "သွား",
    "သည်"
  ],
  "processing_time_ms": 105.12
}
Links

Where to go next

Resource Purpose
/wiki Operations handbook, deployment topology, and CI/CD notes.
/docs Swagger UI backed by the live OpenAPI schema.
/redoc Alternative API reference with a long-form layout.
GitHub repository Source code, deploy workflow, and infrastructure templates.