MLflow#

sktime 自定义模型风格（flavor）可以通过 sktime.utils.mlflow_sktime.save_model() 和 sktime.utils.mlflow_sktime.log_model() 方法将 sktime 模型记录为 MLflow 格式。这些方法还会将 pyfunc 风格添加到它们生成的 MLflow 模型中，使得模型可以通过 sktime.utils.mlflow_sktime.pyfunc.load_model() 被解释为用于推理的通用 Python 函数。这个加载的 PyFunc 模型只能接受 DataFrame 输入进行评分。您还可以使用 sktime.utils.mlflow_sktime.load_model() 方法以原生的 sktime 格式加载带有 sktime 模型风格的 MLflow 模型。

模型的 pyfunc 风格支持 sktime 的预测方法，包括 predict、predict_interval、predict_proba、predict_quantiles、predict_var。

利用加载为 pyfunc 类型的 sktime 模型生成预测的接口要求将外生回归器作为 Pandas DataFrame 传递给 pyfunc.predict() 方法（如果没有使用外生回归器，必须传递一个空的 DataFrame）。传递给预测方法的配置（包括预测方法和参数值）由一个字典定义，该字典应保存为已拟合的 sktime 模型实例的一个属性。如果未定义预测配置，pyfunc.predict() 将返回 sktime predict 方法的输出。注意，对于 pyfunc 风格，预测 horizon fh 必须传递给 fit 方法。

pyfunc 风格的预测方法和参数值可以通过两种方式定义

Dict[str, dict]，如果参数值传递给 pyfunc.predict()，例如 {"predict_method": {"predict": {}, "predict_interval": {"coverage": [0.1, 0.9]}}
Dict[str, list]，使用预测方法中的默认参数，例如 {"predict_method": ["predict", "predict_interval"]} （注意：当包含 predict_proba 方法时，必须遵循前一种方法，因为 quantiles 参数必须由用户提供）
如果未定义预测配置，pyfunc.predict() 将返回 sktime predict() 方法的输出

对于非 pyfunc 类型的 sktime 模型工件，签名记录（signature logging）在处理 predict_interval 或 predict_quantiles 时将无法正常工作。由于返回的 DataFrame 具有 MultiIndex 列结构，原生 sktime 模型风格对于这些方法的输出不是一种可识别的签名类型。但是，如果使用模型的 pyfunc 风格，MLflow 的 infer_schema 将正常工作。

1. 设置#

1.1 配置#

[1]:

model_path = "model"

1.1 导入#

[2]:

import mlflow

from sktime.datasets import load_longley
from sktime.forecasting.naive import NaiveForecaster
from sktime.split import temporal_train_test_split
from sktime.utils import mlflow_sktime

1.2 加载示例数据#

[3]:

y, X = load_longley()
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X)

2. 原生 `sktime 风格` 和 `pyfunc 风格` 的使用示例#

2.1 为 pyfunc 风格创建预测配置#

[4]:

coverage = [0.8, 0.9]
quantiles = [0.1, 0.9]

pyfunc_predict_conf = {
    "predict_method": {
        "predict": {},
        "predict_interval": {"coverage": coverage},
        "predict_proba": {"quantiles": quantiles},
        "predict_quantiles": {},
        "predict_var": {},
    }
}

2.2 训练并保存模型#

[5]:

with mlflow.start_run():
    forecaster = NaiveForecaster()
    forecaster.fit(
        y_train,
        X=X_train,
        fh=[1, 2, 3],
    )
    forecaster.pyfunc_predict_conf = pyfunc_predict_conf

    mlflow_sktime.save_model(forecaster, model_path)

/home/benjamin/anaconda3/envs/sktime-dev/lib/python3.8/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")

2.3 加载模型#

2.3.1 原生 sktime 风格#

[6]:

loaded_model = mlflow_sktime.load_model(model_path)

2.3.2 Pyfunc 风格#

[7]:

loaded_pyfunc = mlflow_sktime.pyfunc.load_model(model_path)

2.4 生成预测#

2.4.1 原生 sktime 风格#

[8]:

loaded_model.predict(X=X_test)

[8]:

1959    66513.0
1960    66513.0
1961    66513.0
Freq: A-DEC, dtype: float64

[9]:

loaded_model.predict_interval(X=X_test, coverage=coverage)

[9]:

	覆盖率
	0.8		0.9
	下限	上限	下限	上限
1959	64719.913711	68306.086289	64211.598663	68814.401337
1960	63977.193051	69048.806949	63258.327017	69767.672983
1961	63407.283445	69618.716555	62526.855956	70499.144044

[10]:

y_pred_dist = loaded_model.predict_proba(X=X)
y_pred_dist_quantiles = y_pred_dist.quantile(quantiles)
y_pred_dist_quantiles

2022-12-19 10:07:18.984171: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-19 10:07:19.137912: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-12-19 10:07:19.137942: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-12-19 10:07:19.961244: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-12-19 10:07:19.961430: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2022-12-19 10:07:19.961446: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2022-12-19 10:07:21.052618: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-19 10:07:21.053031: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.053101: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.053161: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.054924: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.054986: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.055042: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.055054: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://tensorflowcn.cn/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2022-12-19 10:07:21.055334: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

[10]:

	Quantiles_0.1	Quantiles_0.9
1959	64719.914062	68306.085938
1960	63977.191406	69048.804688
1961	63407.281250	69618.718750

[11]:

loaded_model.predict_quantiles(X=X_test)

[11]:

	分位数
	0.05	0.95
1959	64211.598663	68814.401337
1960	63258.327017	69767.672983
1961	62526.855956	70499.144044

[12]:

loaded_model.predict_var(X=X_test)

[12]:

	0
1959	1.957628e+06
1960	3.915256e+06
1961	5.872885e+06

2.4.2 Pyfunc 风格#

[13]:

loaded_pyfunc.predict(X_test)

[13]:

	predict__0	predict_interval__Coverage__0.8__lower	predict_interval__Coverage__0.8__upper	predict_interval__Coverage__0.9__lower	predict_interval__Coverage__0.9__upper	predict_proba__Quantiles_0.1	predict_proba__Quantiles_0.9	predict_quantiles__Quantiles__0.05	predict_quantiles__Quantiles__0.95	predict_var__0
1959	66513.0	64719.913711	68306.086289	64211.598663	68814.401337	64719.914062	68306.085938	64211.598663	68814.401337	1.957628e+06
1960	66513.0	63977.193051	69048.806949	63258.327017	69767.672983	63977.191406	69048.804688	63258.327017	69767.672983	3.915256e+06
1961	66513.0	63407.283445	69618.716555	62526.855956	70499.144044	63407.281250	69618.718750	62526.855956	70499.144044	5.872885e+06

3. 模型部署示例#

3.1 创建实验#

[14]:

artifact_path = "model"

mlflow.set_experiment("Test Sktime")

with mlflow.start_run() as run:
    forecaster = NaiveForecaster()
    forecaster.fit(y_train, X=X_train, fh=[1, 2, 3])
    forecaster.pyfunc_predict_conf = pyfunc_predict_conf

    mlflow_sktime.log_model(sktime_model=forecaster, artifact_path=artifact_path)

run_id = run.info.run_id
print(f"MLflow run id: {run_id}")

2022/12/19 10:07:21 INFO mlflow.tracking.fluent: Experiment with name 'Test Sktime' does not exist. Creating a new experiment.

MLflow run id: ec94d157fe354c1bbdd4dec0898e1ed6

3.2 将 pyfunc 模型部署到本地 REST API 端点#

打开终端窗口并 cd 进入 examples 目录
在终端中运行: mlflow models serve -m runs:/<RUN_ID>/model --env-manager local --host <HOST>
- 其中将 <RUN_ID> 替换为 run_id，将 <HOST> 替换为要监听的网络地址（例如 127.0.0.1）
更多详情请参阅：https://www.mlflow.org/docs/latest/cli.html#mlflow-models-serve

3.3 从本地 REST API 端点请求预测#

更多详情请参阅：https://www.mlflow.org/docs/latest/models.html#built-in-deployment-tools

3.3.1 使用 `dataframe_split` 字段和 `split` 方向的 pandas DataFrame 进行 JSON 输入#

[15]:

host = "127.0.0.1"
url = f"http://{host}:5000/invocations"

X_test = X_test.reset_index(drop=True)
json_data = {"dataframe_split": X_test.to_dict(orient="split")}
print(json_data)

# # Comment in the below lines to run the prediction request
# import requests
# response = requests.post(url, json=json_data)
# response.json()

{'dataframe_split': {'index': [0, 1, 2, 3], 'columns': ['GNPDEFL', 'GNP', 'UNEMP', 'ARMED', 'POP'], 'data': [[112.6, 482704.0, 3813.0, 2552.0, 123366.0], [114.2, 502601.0, 3931.0, 2514.0, 125368.0], [115.7, 518173.0, 4806.0, 2572.0, 127852.0], [116.9, 554894.0, 4007.0, 2827.0, 130081.0]]}}

3.3.2 使用 `dataframe_records` 字段和 `records` 方向的 pandas DataFrame 进行 JSON 输入#

[16]:

json_data = {"dataframe_records": X_test.to_dict(orient="records")}
print(json_data)

# # Comment in the below lines to run the prediction request
# response = requests.post(url, json=json_data)
# response.json()

{'dataframe_records': [{'GNPDEFL': 112.6, 'GNP': 482704.0, 'UNEMP': 3813.0, 'ARMED': 2552.0, 'POP': 123366.0}, {'GNPDEFL': 114.2, 'GNP': 502601.0, 'UNEMP': 3931.0, 'ARMED': 2514.0, 'POP': 125368.0}, {'GNPDEFL': 115.7, 'GNP': 518173.0, 'UNEMP': 4806.0, 'ARMED': 2572.0, 'POP': 127852.0}, {'GNPDEFL': 116.9, 'GNP': 554894.0, 'UNEMP': 4007.0, 'ARMED': 2827.0, 'POP': 130081.0}]}

3.3.3 使用有效的 `pd.DataFrame` CSV 表示进行 CSV 输入#

[17]:

headers = {
    "Content-Type": "text/csv",
}
data = X_test.to_csv()
print(data)

# # Comment in the below lines to run the prediction request
# response = requests.post(url, headers=headers, data=data)
# response.json()

,GNPDEFL,GNP,UNEMP,ARMED,POP
0,112.6,482704.0,3813.0,2552.0,123366.0
1,114.2,502601.0,3931.0,2514.0,125368.0
2,115.7,518173.0,4806.0,2572.0,127852.0
3,116.9,554894.0,4007.0,2827.0,130081.0

使用 nbsphinx 生成。Jupyter notebook 可在此处找到。