BaseBenchmark#

class BaseBenchmark(id_format: str | None = None)[source]#

基准测试的基类。

基准测试由一组任务和一组评估器组成。

参数:

id_format: str, 可选 (默认=None): 用于强制任务/评估器 ID 匹配特定格式的正则表达式，如果为 None，则不对任务/评估器 ID 强制执行格式

方法

`add_estimator`(estimator[, estimator_id])	向基准测试注册评估器。
`run`(output_file[, force_rerun])	运行所有任务和评估器的基准测试。

add_estimator(estimator: BaseEstimator, estimator_id: str | None = None)[source]#

向基准测试注册评估器。

参数:

estimatorDict, List 或 BaseEstimator 对象: 要添加到基准测试的评估器。如果为 Dict，键是用于自定义标识符 ID 的 estimator_ids，值是评估器。如果为 List，每个元素是一个评估器。estimator_ids 会自动使用评估器的类名生成。
estimator_idstr, 可选 (默认=None): 评估器的标识符。如果未给出，则使用评估器的类名。

run(output_file: str, force_rerun: str | list[str] = 'none')[source]#

运行所有任务和评估器的基准测试。

参数:

output_filestr: 保存结果的路径。
force_rerunUnion[str, list[str]], 可选 (默认=”none”): 如果为“none”，如果结果已存在则跳过验证。如果为“all”，则对所有任务和模型运行验证。如果为 str 列表，则对列表中的任务和模型运行验证。