MLflow#
sktime 自定义模型风格(flavor)可以通过 sktime.utils.mlflow_sktime.save_model()
和 sktime.utils.mlflow_sktime.log_model()
方法将 sktime 模型记录为 MLflow 格式。这些方法还会将 pyfunc
风格添加到它们生成的 MLflow 模型中,使得模型可以通过 sktime.utils.mlflow_sktime.pyfunc.load_model()
被解释为用于推理的通用 Python 函数。这个加载的 PyFunc 模型只能接受 DataFrame 输入进行评分。您还可以使用 sktime.utils.mlflow_sktime.load_model()
方法以原生的 sktime 格式加载带有 sktime 模型风格的 MLflow 模型。
模型的 pyfunc
风格支持 sktime 的预测方法,包括 predict
、predict_interval
、predict_proba
、predict_quantiles
、predict_var
。
利用加载为 pyfunc
类型的 sktime 模型生成预测的接口要求将外生回归器作为 Pandas DataFrame 传递给 pyfunc.predict()
方法(如果没有使用外生回归器,必须传递一个空的 DataFrame)。传递给预测方法的配置(包括预测方法和参数值)由一个字典定义,该字典应保存为已拟合的 sktime 模型实例的一个属性。如果未定义预测配置,pyfunc.predict()
将返回 sktime predict
方法的输出。注意,对于 pyfunc
风格,预测 horizon fh
必须传递给 fit 方法。
pyfunc
风格的预测方法和参数值可以通过两种方式定义
Dict[str, dict]
,如果参数值传递给pyfunc.predict()
,例如{"predict_method": {"predict": {}, "predict_interval": {"coverage": [0.1, 0.9]}}
Dict[str, list]
,使用预测方法中的默认参数,例如{"predict_method": ["predict", "predict_interval"]}
(注意:当包含predict_proba
方法时,必须遵循前一种方法,因为quantiles
参数必须由用户提供)如果未定义预测配置,
pyfunc.predict()
将返回 sktimepredict()
方法的输出
对于非 pyfunc
类型的 sktime 模型工件,签名记录(signature logging)在处理 predict_interval
或 predict_quantiles
时将无法正常工作。由于返回的 DataFrame 具有 MultiIndex 列结构,原生 sktime 模型风格对于这些方法的输出不是一种可识别的签名类型。但是,如果使用模型的 pyfunc
风格,MLflow 的 infer_schema
将正常工作。
1. 设置#
1.1 配置#
[1]:
model_path = "model"
1.1 导入#
[2]:
import mlflow
from sktime.datasets import load_longley
from sktime.forecasting.naive import NaiveForecaster
from sktime.split import temporal_train_test_split
from sktime.utils import mlflow_sktime
1.2 加载示例数据#
[3]:
y, X = load_longley()
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X)
2. 原生 sktime 风格
和 pyfunc 风格
的使用示例#
2.1 为 pyfunc 风格创建预测配置#
[4]:
coverage = [0.8, 0.9]
quantiles = [0.1, 0.9]
pyfunc_predict_conf = {
"predict_method": {
"predict": {},
"predict_interval": {"coverage": coverage},
"predict_proba": {"quantiles": quantiles},
"predict_quantiles": {},
"predict_var": {},
}
}
2.2 训练并保存模型#
[5]:
with mlflow.start_run():
forecaster = NaiveForecaster()
forecaster.fit(
y_train,
X=X_train,
fh=[1, 2, 3],
)
forecaster.pyfunc_predict_conf = pyfunc_predict_conf
mlflow_sktime.save_model(forecaster, model_path)
/home/benjamin/anaconda3/envs/sktime-dev/lib/python3.8/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
2.3 加载模型#
2.3.1 原生 sktime 风格#
[6]:
loaded_model = mlflow_sktime.load_model(model_path)
2.3.2 Pyfunc 风格#
[7]:
loaded_pyfunc = mlflow_sktime.pyfunc.load_model(model_path)
2.4 生成预测#
2.4.1 原生 sktime 风格#
[8]:
loaded_model.predict(X=X_test)
[8]:
1959 66513.0
1960 66513.0
1961 66513.0
Freq: A-DEC, dtype: float64
[9]:
loaded_model.predict_interval(X=X_test, coverage=coverage)
[9]:
覆盖率 | ||||
---|---|---|---|---|
0.8 | 0.9 | |||
下限 | 上限 | 下限 | 上限 | |
1959 | 64719.913711 | 68306.086289 | 64211.598663 | 68814.401337 |
1960 | 63977.193051 | 69048.806949 | 63258.327017 | 69767.672983 |
1961 | 63407.283445 | 69618.716555 | 62526.855956 | 70499.144044 |
[10]:
y_pred_dist = loaded_model.predict_proba(X=X)
y_pred_dist_quantiles = y_pred_dist.quantile(quantiles)
y_pred_dist_quantiles
2022-12-19 10:07:18.984171: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-19 10:07:19.137912: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-12-19 10:07:19.137942: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-12-19 10:07:19.961244: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-12-19 10:07:19.961430: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2022-12-19 10:07:19.961446: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2022-12-19 10:07:21.052618: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-19 10:07:21.053031: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.053101: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.053161: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.054924: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.054986: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.055042: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2022-12-19 10:07:21.055054: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://tensorflowcn.cn/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2022-12-19 10:07:21.055334: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
[10]:
Quantiles_0.1 | Quantiles_0.9 | |
---|---|---|
1959 | 64719.914062 | 68306.085938 |
1960 | 63977.191406 | 69048.804688 |
1961 | 63407.281250 | 69618.718750 |
[11]:
loaded_model.predict_quantiles(X=X_test)
[11]:
分位数 | ||
---|---|---|
0.05 | 0.95 | |
1959 | 64211.598663 | 68814.401337 |
1960 | 63258.327017 | 69767.672983 |
1961 | 62526.855956 | 70499.144044 |
[12]:
loaded_model.predict_var(X=X_test)
[12]:
0 | |
---|---|
1959 | 1.957628e+06 |
1960 | 3.915256e+06 |
1961 | 5.872885e+06 |
2.4.2 Pyfunc 风格#
[13]:
loaded_pyfunc.predict(X_test)
[13]:
predict__0 | predict_interval__Coverage__0.8__lower | predict_interval__Coverage__0.8__upper | predict_interval__Coverage__0.9__lower | predict_interval__Coverage__0.9__upper | predict_proba__Quantiles_0.1 | predict_proba__Quantiles_0.9 | predict_quantiles__Quantiles__0.05 | predict_quantiles__Quantiles__0.95 | predict_var__0 | |
---|---|---|---|---|---|---|---|---|---|---|
1959 | 66513.0 | 64719.913711 | 68306.086289 | 64211.598663 | 68814.401337 | 64719.914062 | 68306.085938 | 64211.598663 | 68814.401337 | 1.957628e+06 |
1960 | 66513.0 | 63977.193051 | 69048.806949 | 63258.327017 | 69767.672983 | 63977.191406 | 69048.804688 | 63258.327017 | 69767.672983 | 3.915256e+06 |
1961 | 66513.0 | 63407.283445 | 69618.716555 | 62526.855956 | 70499.144044 | 63407.281250 | 69618.718750 | 62526.855956 | 70499.144044 | 5.872885e+06 |
3. 模型部署示例#
3.1 创建实验#
[14]:
artifact_path = "model"
mlflow.set_experiment("Test Sktime")
with mlflow.start_run() as run:
forecaster = NaiveForecaster()
forecaster.fit(y_train, X=X_train, fh=[1, 2, 3])
forecaster.pyfunc_predict_conf = pyfunc_predict_conf
mlflow_sktime.log_model(sktime_model=forecaster, artifact_path=artifact_path)
run_id = run.info.run_id
print(f"MLflow run id: {run_id}")
2022/12/19 10:07:21 INFO mlflow.tracking.fluent: Experiment with name 'Test Sktime' does not exist. Creating a new experiment.
MLflow run id: ec94d157fe354c1bbdd4dec0898e1ed6
3.2 将 pyfunc 模型部署到本地 REST API 端点#
打开终端窗口并
cd
进入examples
目录在终端中运行:
mlflow models serve -m runs:/<RUN_ID>/model --env-manager local --host <HOST>
其中将
<RUN_ID>
替换为run_id
,将<HOST>
替换为要监听的网络地址(例如127.0.0.1
)
更多详情请参阅:https://www.mlflow.org/docs/latest/cli.html#mlflow-models-serve
3.3 从本地 REST API 端点请求预测#
3.3.1 使用 dataframe_split
字段和 split
方向的 pandas DataFrame 进行 JSON 输入#
[15]:
host = "127.0.0.1"
url = f"http://{host}:5000/invocations"
X_test = X_test.reset_index(drop=True)
json_data = {"dataframe_split": X_test.to_dict(orient="split")}
print(json_data)
# # Comment in the below lines to run the prediction request
# import requests
# response = requests.post(url, json=json_data)
# response.json()
{'dataframe_split': {'index': [0, 1, 2, 3], 'columns': ['GNPDEFL', 'GNP', 'UNEMP', 'ARMED', 'POP'], 'data': [[112.6, 482704.0, 3813.0, 2552.0, 123366.0], [114.2, 502601.0, 3931.0, 2514.0, 125368.0], [115.7, 518173.0, 4806.0, 2572.0, 127852.0], [116.9, 554894.0, 4007.0, 2827.0, 130081.0]]}}
3.3.2 使用 dataframe_records
字段和 records
方向的 pandas DataFrame 进行 JSON 输入#
[16]:
json_data = {"dataframe_records": X_test.to_dict(orient="records")}
print(json_data)
# # Comment in the below lines to run the prediction request
# response = requests.post(url, json=json_data)
# response.json()
{'dataframe_records': [{'GNPDEFL': 112.6, 'GNP': 482704.0, 'UNEMP': 3813.0, 'ARMED': 2552.0, 'POP': 123366.0}, {'GNPDEFL': 114.2, 'GNP': 502601.0, 'UNEMP': 3931.0, 'ARMED': 2514.0, 'POP': 125368.0}, {'GNPDEFL': 115.7, 'GNP': 518173.0, 'UNEMP': 4806.0, 'ARMED': 2572.0, 'POP': 127852.0}, {'GNPDEFL': 116.9, 'GNP': 554894.0, 'UNEMP': 4007.0, 'ARMED': 2827.0, 'POP': 130081.0}]}
3.3.3 使用有效的 pd.DataFrame
CSV 表示进行 CSV 输入#
[17]:
headers = {
"Content-Type": "text/csv",
}
data = X_test.to_csv()
print(data)
# # Comment in the below lines to run the prediction request
# response = requests.post(url, headers=headers, data=data)
# response.json()
,GNPDEFL,GNP,UNEMP,ARMED,POP
0,112.6,482704.0,3813.0,2552.0,123366.0
1,114.2,502601.0,3931.0,2514.0,125368.0
2,115.7,518173.0,4806.0,2572.0,127852.0
3,116.9,554894.0,4007.0,2827.0,130081.0