For this feature, we will add several APIs for managing models and training. Based on some offline discussions, I have decided to refactor some of the API's to be more CRUD-like.
For training, users will create and get the status of training jobs through the training jobs API's:
// Gets information about training jobs on a particular node or all nodes
GET /_plugins/_knn/{node_id}/train-jobs/{model_id}
{
"node_1": {
"model_1": {
"status": "IN_PROGRESS",
"train_index": "train-index-name",
"train_field": "train-field-name",
"dimension": 16,
"method": {
...
}
},
...
},
...
}
// Submit a training job. If a user specifies the node_id, the request should get routed there.
PUT /_plugins/_knn/{node_id}/train-jobs/{model_id} {
"train_index": "train-index-name",
"train_field": "train-field-name",
"dimension": 16,
"method": {
"name":"ivf",
"engine":"faiss",
"space_type": "l2",
"parameters":{
"ncentroids":128,
"coarse_quantizer":{
"name":"ivf",
"parameters":{
"ncentroids":15
}
},
"encoder":{
"name":"pq",
"parameters":{
"code_size":8
}
},
}
}
}
{
"acknowledged": true,
"model_id": "custom-model-id"
}
A user will also have APIs to get/create/delete the models present in the cluster:
// Return the model (or all models if model_id is not set)
GET /_plugins/_knn/models/{model_id}?{field_filter_id1}&{field_filter_id2}
{
"model_id": {
"engine: "engine-of-the-model",
"dimension": X,
"space_type": "space-type-of-the-model",
"model_blob": "some base64 encoded string"
},
...
}
// Delete the model from the model system index and cache. model_id should be required
DELETE /_plugins/_knn/models/{model_id}
{
"acknowledged": true
}
// Upload API
PUT /_plugins/_knn/models/{model_id}
{
"model_id": "custom-model-id",
"engine: "engine-of-the-model",
"dimension": X,
"space_type": "space-type-of-the-model",
"model_blob": "some base64 encoded string"
}
{
"acknowledged": true,
"model_id": "custom-model-id",
}
@wnbts could you take a look at these APIs and offer any recommendations you have that could improve them?