binder

MLflow#

sktime 自定义模型风格(flavor)可以通过 sktime.utils.mlflow_sktime.save_model()sktime.utils.mlflow_sktime.log_model() 方法将 sktime 模型记录为 MLflow 格式。这些方法还会将 pyfunc 风格添加到它们生成的 MLflow 模型中,使得模型可以通过 sktime.utils.mlflow_sktime.pyfunc.load_model() 被解释为用于推理的通用 Python 函数。这个加载的 PyFunc 模型只能接受 DataFrame 输入进行评分。您还可以使用 sktime.utils.mlflow_sktime.load_model() 方法以原生的 sktime 格式加载带有 sktime 模型风格的 MLflow 模型。

模型的 pyfunc 风格支持 sktime 的预测方法,包括 predictpredict_intervalpredict_probapredict_quantilespredict_var

利用加载为 pyfunc 类型的 sktime 模型生成预测的接口要求将外生回归器作为 Pandas DataFrame 传递给 pyfunc.predict() 方法(如果没有使用外生回归器,必须传递一个空的 DataFrame)。传递给预测方法的配置(包括预测方法和参数值)由一个字典定义,该字典应保存为已拟合的 sktime 模型实例的一个属性。如果未定义预测配置,pyfunc.predict() 将返回 sktime predict 方法的输出。注意,对于 pyfunc 风格,预测 horizon fh 必须传递给 fit 方法。

pyfunc 风格的预测方法和参数值可以通过两种方式定义

  • Dict[str, dict],如果参数值传递给 pyfunc.predict(),例如 {"predict_method": {"predict": {}, "predict_interval": {"coverage": [0.1, 0.9]}}

  • Dict[str, list],使用预测方法中的默认参数,例如 {"predict_method": ["predict", "predict_interval"]} (注意:当包含 predict_proba 方法时,必须遵循前一种方法,因为 quantiles 参数必须由用户提供)

  • 如果未定义预测配置,pyfunc.predict() 将返回 sktime predict() 方法的输出

对于非 pyfunc 类型的 sktime 模型工件,签名记录(signature logging)在处理 predict_intervalpredict_quantiles 时将无法正常工作。由于返回的 DataFrame 具有 MultiIndex 列结构,原生 sktime 模型风格对于这些方法的输出不是一种可识别的签名类型。但是,如果使用模型的 pyfunc 风格,MLflow 的 infer_schema 将正常工作。

1. 设置#

1.1 配置#

[1]:
model_path = "model"

1.1 导入#

[2]:
import mlflow

from sktime.datasets import load_longley
from sktime.forecasting.naive import NaiveForecaster
from sktime.split import temporal_train_test_split
from sktime.utils import mlflow_sktime

1.2 加载示例数据#

[3]:
y, X = load_longley()
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X)

2. 原生 sktime 风格pyfunc 风格 的使用示例#

2.1 为 pyfunc 风格创建预测配置#

[4]:
coverage = [0.8, 0.9]
quantiles = [0.1, 0.9]

pyfunc_predict_conf = {
    "predict_method": {
        "predict": {},
        "predict_interval": {"coverage": coverage},
        "predict_proba": {"quantiles": quantiles},
        "predict_quantiles": {},
        "predict_var": {},
    }
}

2.2 训练并保存模型#

[5]:
with mlflow.start_run():
    forecaster = NaiveForecaster()
    forecaster.fit(
        y_train,
        X=X_train,
        fh=[1, 2, 3],
    )
    forecaster.pyfunc_predict_conf = pyfunc_predict_conf

    mlflow_sktime.save_model(forecaster, model_path)
/home/benjamin/anaconda3/envs/sktime-dev/lib/python3.8/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")

2.3 加载模型#

2.3.1 原生 sktime 风格#

[6]:
loaded_model = mlflow_sktime.load_model(model_path)

2.3.2 Pyfunc 风格#

[7]:
loaded_pyfunc = mlflow_sktime.pyfunc.load_model(model_path)

2.4 生成预测#

2.4.1 原生 sktime 风格#

[8]:
loaded_model.predict(X=X_test)
[8]:
1959    66513.0
1960    66513.0
1961    66513.0
Freq: A-DEC, dtype: float64
[9]:
loaded_model.predict_interval(X=X_test, coverage=coverage)
[9]:
覆盖率
0.8 0.9
下限 上限 下限 上限
1959 64719.913711 68306.086289 64211.598663 68814.401337
1960 63977.193051 69048.806949 63258.327017 69767.672983
1961 63407.283445 69618.716555 62526.855956 70499.144044
[10]:
y_pred_dist = loaded_model.predict_proba(X=X)
y_pred_dist_quantiles = y_pred_dist.quantile(quantiles)
y_pred_dist_quantiles
2022-12-19 10:07:18.984171: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-19 10:07:19.137912: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-12-19 10:07:19.137942: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-12-19 10:07:19.961244: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-12-19 10:07:19.961430: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2022-12-19 10:07:19.961446: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2022-12-19 10:07:21.052618: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-19 10:07:21.053031: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.053101: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.053161: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.054924: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.054986: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.055042: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.055054: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://tensorflowcn.cn/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2022-12-19 10:07:21.055334: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
[10]:
Quantiles_0.1 Quantiles_0.9
1959 64719.914062 68306.085938
1960 63977.191406 69048.804688
1961 63407.281250 69618.718750
[11]:
loaded_model.predict_quantiles(X=X_test)
[11]:
分位数
0.05 0.95
1959 64211.598663 68814.401337
1960 63258.327017 69767.672983
1961 62526.855956 70499.144044
[12]:
loaded_model.predict_var(X=X_test)
[12]:
0
1959 1.957628e+06
1960 3.915256e+06
1961 5.872885e+06

2.4.2 Pyfunc 风格#

[13]:
loaded_pyfunc.predict(X_test)
[13]:
predict__0 predict_interval__Coverage__0.8__lower predict_interval__Coverage__0.8__upper predict_interval__Coverage__0.9__lower predict_interval__Coverage__0.9__upper predict_proba__Quantiles_0.1 predict_proba__Quantiles_0.9 predict_quantiles__Quantiles__0.05 predict_quantiles__Quantiles__0.95 predict_var__0
1959 66513.0 64719.913711 68306.086289 64211.598663 68814.401337 64719.914062 68306.085938 64211.598663 68814.401337 1.957628e+06
1960 66513.0 63977.193051 69048.806949 63258.327017 69767.672983 63977.191406 69048.804688 63258.327017 69767.672983 3.915256e+06
1961 66513.0 63407.283445 69618.716555 62526.855956 70499.144044 63407.281250 69618.718750 62526.855956 70499.144044 5.872885e+06

3. 模型部署示例#

3.1 创建实验#

[14]:
artifact_path = "model"

mlflow.set_experiment("Test Sktime")

with mlflow.start_run() as run:
    forecaster = NaiveForecaster()
    forecaster.fit(y_train, X=X_train, fh=[1, 2, 3])
    forecaster.pyfunc_predict_conf = pyfunc_predict_conf

    mlflow_sktime.log_model(sktime_model=forecaster, artifact_path=artifact_path)

run_id = run.info.run_id
print(f"MLflow run id: {run_id}")
2022/12/19 10:07:21 INFO mlflow.tracking.fluent: Experiment with name 'Test Sktime' does not exist. Creating a new experiment.
MLflow run id: ec94d157fe354c1bbdd4dec0898e1ed6

3.2 将 pyfunc 模型部署到本地 REST API 端点#

  • 打开终端窗口并 cd 进入 examples 目录

  • 在终端中运行: mlflow models serve -m runs:/<RUN_ID>/model --env-manager local --host <HOST>

    • 其中将 <RUN_ID> 替换为 run_id,将 <HOST> 替换为要监听的网络地址(例如 127.0.0.1

  • 更多详情请参阅:https://www.mlflow.org/docs/latest/cli.html#mlflow-models-serve

3.3 从本地 REST API 端点请求预测#

3.3.1 使用 dataframe_split 字段和 split 方向的 pandas DataFrame 进行 JSON 输入#

[15]:
host = "127.0.0.1"
url = f"http://{host}:5000/invocations"

X_test = X_test.reset_index(drop=True)
json_data = {"dataframe_split": X_test.to_dict(orient="split")}
print(json_data)

# # Comment in the below lines to run the prediction request
# import requests
# response = requests.post(url, json=json_data)
# response.json()
{'dataframe_split': {'index': [0, 1, 2, 3], 'columns': ['GNPDEFL', 'GNP', 'UNEMP', 'ARMED', 'POP'], 'data': [[112.6, 482704.0, 3813.0, 2552.0, 123366.0], [114.2, 502601.0, 3931.0, 2514.0, 125368.0], [115.7, 518173.0, 4806.0, 2572.0, 127852.0], [116.9, 554894.0, 4007.0, 2827.0, 130081.0]]}}

3.3.2 使用 dataframe_records 字段和 records 方向的 pandas DataFrame 进行 JSON 输入#

[16]:
json_data = {"dataframe_records": X_test.to_dict(orient="records")}
print(json_data)

# # Comment in the below lines to run the prediction request
# response = requests.post(url, json=json_data)
# response.json()
{'dataframe_records': [{'GNPDEFL': 112.6, 'GNP': 482704.0, 'UNEMP': 3813.0, 'ARMED': 2552.0, 'POP': 123366.0}, {'GNPDEFL': 114.2, 'GNP': 502601.0, 'UNEMP': 3931.0, 'ARMED': 2514.0, 'POP': 125368.0}, {'GNPDEFL': 115.7, 'GNP': 518173.0, 'UNEMP': 4806.0, 'ARMED': 2572.0, 'POP': 127852.0}, {'GNPDEFL': 116.9, 'GNP': 554894.0, 'UNEMP': 4007.0, 'ARMED': 2827.0, 'POP': 130081.0}]}

3.3.3 使用有效的 pd.DataFrame CSV 表示进行 CSV 输入#

[17]:
headers = {
    "Content-Type": "text/csv",
}
data = X_test.to_csv()
print(data)

# # Comment in the below lines to run the prediction request
# response = requests.post(url, headers=headers, data=data)
# response.json()
,GNPDEFL,GNP,UNEMP,ARMED,POP
0,112.6,482704.0,3813.0,2552.0,123366.0
1,114.2,502601.0,3931.0,2514.0,125368.0
2,115.7,518173.0,4806.0,2572.0,127852.0
3,116.9,554894.0,4007.0,2827.0,130081.0


使用 nbsphinx 生成。Jupyter notebook 可在此处找到。