EasyLLM：实现 OpenAI 和 Hugging Face 客户端的无缝切换-脚本导航

> 自媒体 > （AI）人工智能 > EasyLLM：实现 OpenAI 和 Hugging Face 客户端的无缝切换

EasyLLM：实现 OpenAI 和 Hugging Face 客户端的无缝切换

来源：闪耀之星AK

2023-08-11 18:11:35

352

管理

前言

在这短短不到一年的时间里，国内外涌现的大型语言模型（LLM）可谓是百花齐放，不管是开源还是闭源都出现了一些非常优秀的模型，然而在利用LLM进行应用开发的时候，会发现每个模型从部署、到训练、微调、API接口开发、Prompt提示词格式等方面都存在或多或少的差异，导致如果一个产品需要接入不同的LLM或者快速切换模型的时候变得更加复杂，使用没有那么方便，也不便于维护。

首先，LLM的使用和部署过程相对复杂。不同的LLM提供商和框架之间存在着差异，导致用户需要进行繁琐的配置和适配工作。例如，使用OpenAI的Completion API、ChatCompletion、Completion和Embedding与使用Hugging Face的对应功能之间可能存在不兼容性，需要用户手动修改代码以适应不同的模型。

其次，LLM的提示格式也是一个问题。不同的LLM可能使用不同的提示格式，使得在不同模型之间切换时需要进行格式转换。这给用户带来了额外的工作量和学习成本。

此外，LLM的响应时间也是一个考虑因素。在某些场景下，特别是需要实时交互的情况下，等待整个LLM完成生成结果可能会导致延迟和不便。

为了解决以上存在的这些问题，EasyLLM应运而生，可以帮我们很轻松的解决这些问题。

一、EasyLLM 介绍

EasyLLM 是一个开源项目，旨在简化和提升处理LLM的过程。它提供了兼容的客户端，使用户能够轻松地在不同的LLM之间切换，只需修改一行代码即可实现。此外，EasyLLM还提供了一个提示助手，帮助用户在不同LLM的格式之间进行转换。而且，EasyLLM支持流式传输，用户可以立即获取部分生成结果，而无需等待整个响应。

EasyLLM第一个版本实现了与 OpenAI 的 Completion API 兼容的Client。这意味着您可以轻松地将openai.ChatCompletion, openai.Completion,openai.Embedding替换为 , huggingface.ChatCompletion，huggingface.Completion或者huggingface.Embedding。只需要通过更改一行代码即可替换。

通过EasyLLM，我们可以更加方便地利用和应用不同的LLM模型，提高工作效率和灵活性。接下来，让我们深入了解EasyLLM的主要特点和功能，以及它如何为我们带来更好的LLM体验。

二、EasyLLM特点

以下是当前功能的列表

兼容的客户端- 实现与 OpenAI 的 API、ChatCompletion、Completion和兼容的客户端Embedding。通过更改一行代码即可轻松在不同的LLM之间切换。提示助手- 帮助在不同 LLM 的格式之间转换提示的实用程序。例如，从 OpenAI 消息格式转到 LLaMA 等模型的提示。流式传输支持- 从您的 LLM 流式传输完成结果，而不是等待整个响应。非常适合聊天界面之类的东西。

目前为止计划：

evol_instruct（正在进行中） - 是一种使用LLM创建指令的方法，可以将简单的指令演变成复杂的指令。prompt_utils- 帮助方法可以在 OpenAI messages 等提示格式与 Llama 2 等开源模型的提示之间轻松转换。sagemaker客户端可轻松与 Amazon SageMaker 上部署的 LLM 交互三、EasyLLM入门

通过 pip 安装 EasyLLM：

pip install easyllm

然后导入一个客户端并开始使用它：

from easyllm.clients import huggingface# D定义要使用的提示huggingface.prompt_builder = "llama2"# huggingface.api_key="hf_xxx" # change api key if neededresponse = huggingface.ChatCompletion.create( model="meta-llama/Llama-2-70b-chat-hf", messages=[ {"role": "system", "content": "nYou are a helpful assistant speaking like a pirate. argh!"}, {"role": "user", "content": "What is the sun?"}, ], temperature=0.9, top_p=0.6, max_tokens=256,)print(response)

输出结果：

{ "id": "hf-lVC2iTMkFJ", "object": "chat.completion", "created": 1690661144, "model": "meta-llama/Llama-2-70b-chat-hf", "choices": [ { "index": 0, "message": { "role": "assistant", "content": " Arrrr, the sun be a big ol' ball o' fire in the sky, me hearty! It be the source o' light and warmth for our fair planet, and it be a mighty powerful force, savvy? Without the sun, we'd be sailin' through the darkness, lost and cold, so let's give a hearty "Yarrr!" for the sun, me hearties! Arrrr!" }, "finish_reason": null } ], "usage": { "prompt_tokens": 111, "completion_tokens": 299, "total_tokens": 410 }}

查看文档以获取更多示例和详细的使用说明。代码位于GitHub上。

四、EasyLLM 客户端

在 EasyLLM 上下文中，“客户端”是指与特定 LLM API（例如 OpenAI）交互的代码。目前支持的客户端有：

ChatCompletion- ChatCompletion 用于与与 OpenAI ChatCompletion API 兼容的 LLM 进行交互。Completion- 用于与 OpenAI Completion API 兼容的LLM进行交互。Embedding- 用于与 OpenAI Embedding API 兼容的 LLM 进行交互。五、兼容 Hugging Face 客户端

EasyLLM 提供了一个与 HuggingFace 模型连接的客户端。该客户端与HuggingFace Inference API、Hugging Face Inference Endpoints或任何运行文本生成推理或兼容 API 端点的Web 服务兼容。

huggingface.ChatCompletion- 用于与 HuggingFace 模型交互的客户端，该模型与 OpenAI ChatCompletion API 兼容。huggingface.Completion- 用于与与 OpenAI Completion API 兼容的 HuggingFace 模型连接的客户端。huggingface.Embedding- 用于与与 OpenAI Embedding API 兼容的 HuggingFace 模型连接的客户端。5.1、huggingface.ChatCompletion

该huggingface.ChatCompletion客户端用于与在文本生成推理上运行的 HuggingFace 模型交互，这些模型与 OpenAI ChatCompletion API 兼容。

from easyllm.clients import huggingface# hubbingface模块会自动从环境变量HUGGINGFACE_TOKEN或HuggingFace CLI配置文件中加载HuggingFace API密钥。# huggingface.api_key="hf_xxx"hubbingface.prompt_builder = "llama2"response = huggingface.ChatCompletion.create( model="meta-llama/Llama-2-70b-chat-hf", messages=[ {"role": "system", "content": "nYou are a helpful, respectful and honest assistant."}, {"role": "user", "content": "Knock knock."}, ], temperature=0.9, top_p=0.6, max_tokens=1024,)

支持的参数有：

model- 用于生成完成结果的模型。如果未提供，默认使用基本URL。messages-List[ChatMessage]用于生成完成结果的聊天消息列表。temperature- 用于生成完成结果的温度参数。默认为0.9。top_p- 用于生成完成结果的top_p参数。默认为0.6。top_k- 用于生成完成结果的top_k参数。默认为10。n- 要生成的完成结果数量。默认为1。max_tokens- 要生成的最大令牌数。默认为1024。stop- 用于生成完成结果的停止序列。默认为None。stream- 是否流式传输完成结果。默认为False。frequency_penalty- 用于生成完成结果的频率惩罚参数。默认为1.0。debug- 是否启用调试日志记录。默认为False。5.2、huggingface.Completion

该huggingface.Completion客户端用于与在文本生成推理上运行的 HuggingFace 模型进行交互，这些模型与 OpenAI Completion API 兼容。

from easyllm.clients import huggingface# hubbingface模块会自动从环境变量HUGGINGFACE_TOKEN或HuggingFace CLI配置文件中加载HuggingFace API密钥。# huggingface.api_key="hf_xxx"hubbingface.prompt_builder = "llama2"response = huggingface.Completion.create( model="meta-llama/Llama-2-70b-chat-hf", prompt="What is the meaning of life?", temperature=0.9, top_p=0.6, max_tokens=1024,)

支持的参数有：

model- 用于生成完成结果的模型。如果未提供，默认使用基本URL。prompt- 用于完成的文本，如果设置了prompt_builder，则提示将使用prompt_builder进行格式化。temperature- 用于生成完成结果的温度参数。默认为0.9。top_p- 用于生成完成结果的top_p参数。默认为0.6。top_k- 用于生成完成结果的top_k参数。默认为10。n- 要生成的完成结果数量。默认为1。max_tokens- 要生成的最大令牌数。默认为1024。stop- 用于生成完成结果的停止序列。默认为None。stream- 是否流式传输完成结果。默认为False。frequency_penalty- 用于生成完成结果的频率惩罚参数。默认为1.0。debug- 是否启用调试日志记录。默认为False。echo- 是否回显提示。默认为 False。logprobs- 是否返回logprobs（对数概率）。默认为None。5.3、huggingface.Embedding

该huggingface.Embedding客户端用于与作为 API 运行的 HuggingFace 模型进行交互，这些模型与 OpenAI Embedding API 兼容。

from easyllm.clients import huggingface# hubbingface模块会自动从环境变量HUGGINGFACE_TOKEN或HuggingFace CLI配置文件中加载HuggingFace API密钥。# huggingface.api_key="hf_xxx"embedding = huggingface.Embedding.create( model="sentence-transformers/all-MiniLM-L6-v2", text="What is the meaning of life?",)len(embedding["data"][0]["embedding"])

支持的参数有：

model- 用于创建嵌入的模型。如果未提供，则默认为基本 url。input- Union[str, List[str]]要嵌入的文档。5.4、环境配置

可以通过设置 Hugging Face 环境变量或覆盖默认值来配置客户端。下面介绍如何调整 HF 令牌、URL 和提示生成器。

5.4.1、设置HF令牌

默认情况下，huggingface客户端将尝试读取HUGGINGFACE_TOKEN环境变量。如果未设置，它将尝试从~/.huggingface文件夹中读取令牌。如果未设置，则不会使用令牌。

或者，您可以通过设置手动设置令牌huggingface.api_key。

手动设置 api 密钥：

from easyllm.clients import huggingfacehuggingface.api_key="hf_xxx"res = huggingface.ChatCompletion.create(...)

使用环境变量：

import osos.environ["HUGGINGFACE_TOKEN"] = "hf_xxx"from easyllm.clients import huggingface5.4.2、更改URL地址

默认情况下，Hugging Face客户端会尝试读取HUGGINGFACE_API_BASE环境变量。如果未设置该变量，它将使用默认的URL地址：

https://api-inference.huggingface.co/models

这对于想要使用不同的URL地址（如https://zj5lt7pmzqzbp0d1.us-east-1.aws.endpoints.huggingface.cloud）或本地URL地址（如http://localhost:8000）或Hugging Face推理端点非常有用。

另外，您可以通过设置huggingface.api_base来手动设置URL地址。如果您设置了自定义URL地址，则必须将model参数留空。

手动设置 api base：

from easyllm.clients import huggingfacehuggingface.api_base="https://my-url"res = huggingface.ChatCompletion.create(...)

使用环境变量：

import osos.environ["HUGGINGFACE_API_BASE"] = "https://my-url"from easyllm.clients import huggingface5.4.3、构建提示

默认情况下，huggingface客户端将尝试读取HUGGINGFACE_PROMPT环境变量并尝试将值映射到PROMPT_MAPPING字典。如果未设置，它将使用默认的提示生成器。您也可以手动设置。

手动设置提示生成器：

from easyllm.clients import huggingfacehuggingface.prompt_builder = "llama2"res = huggingface.ChatCompletion.create(...)

使用环境变量：

import osos.environ["HUGGINGFACE_PROMPT"] = "llama2"from easyllm.clients import huggingface六、从 OpenAI 迁移到 HuggingFace

从 OpenAI 迁移到 HuggingFace 很容易。只需更改导入语句和要使用的客户端以及可选的提示生成器。

- import openai from easyllm.clients import huggingface huggingface.prompt_builder = "llama2"- response = openai.ChatCompletion.create( response = huggingface.ChatCompletion.create(- model="gpt-3.5-turbo", model="meta-llama/Llama-2-70b-chat-hf", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Knock knock."}, ],)

在切换使用不同的客户端（指使用不同的模型或系统）时，确保你的超参数仍然有效。例如，GPT-3模型的temperature参数可能与Llama-2模型的temperature参数不同。

超参数是在机器学习和深度学习中用于调整模型行为和性能的参数。其中一个常见的超参数是温度（temperature），它控制生成文本的多样性和随机性。不同的模型可能对温度参数有不同的要求或默认值，因此在切换使用不同的模型时，需要注意确保超参数的设置与所使用的模型相匹配，以获得预期的结果。

七、提示工具

prompt_utils 模块包含了一些函数，用于将消息字典转换为可以与ChatCompletion客户端一起使用的提示。

目前支持的提示格式有：

Llama 2VicunaHugging Face ChatMLWizardLMstableBeluga2Open Assistant

Prompt utils 还导出了一个映射字典 PROMPT_MAPPING，它将模型名称映射到一个提示构建函数。可以通过环境变量来选择正确的提示构建函数。

PROMPT_MAPPING = { "chatml_falcon": build_chatml_falcon_prompt, "chatml_starchat": build_chatml_starchat_prompt, "llama2": build_llama2_prompt, "open_assistant": build_open_assistant_prompt, "stablebeluga": build_stablebeluga_prompt, "vicuna": build_vicuna_prompt, "wizardlm": build_wizardlm_prompt,}

以下代码演示了为 Hugging Face 客户端设置提示构建器

from easyllm.clients import huggingface# vicuna, chatml_falcon, chatml_starchat, wizardlm, stablebeluga, open_assistanthuggingface.prompt_builder = "llama2" 7.1、LLama 2 Chat构建器

用于创建LLama 2聊天对话的提示。在Hugging Face博客中可以了解如何使用LLama 2的提示。如果传递了一个不支持的角色的消息，将会抛出错误。

示例模型：

meta-llama/Llama-2-70b-chat-hf

from easyllm.prompt_utils import build_llama2_promptmessages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain asynchronous programming in the style of the pirate Blackbeard."},]prompt = build_llama2_prompt(messages)7.2、Vicuna Chat构建器

用于创建Vicuna聊天对话的提示。如果传递了一个不支持的角色的消息，将会抛出错误。

示例模型：

ehartford/WizardLM-13B-V1.0-Uncensored

from easyllm.prompt_utils import build_vicuna_promptmessages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain asynchronous programming in the style of the pirate Blackbeard."},]prompt = build_vicuna_prompt(messages)7.3、Hugging Face ChatML构建器

用于创建Hugging Face ChatML聊天对话的提示。Hugging Face ChatML针对不同的示例模型有不同的提示，例如StarChat或Falcon。如果传递了一个不支持的角色的消息，将会抛出错误。

示例模型：

HuggingFaceH4/starchat-beta7.3.1、StarChat

from easyllm.prompt_utils import build_chatml_starchat_promptmessages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain asynchronous programming in the style of the pirate Blackbeard."},]prompt = build_chatml_starchat_prompt(messages)7.3.2、Falcon

from easyllm.prompt_utils import build_chatml_falcon_promptmessages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain asynchronous programming in the style of the pirate Blackbeard."},]prompt = build_chatml_falcon_prompt(messages)7.4、WizardLM Chat构建器

用于创建WizardLM聊天对话的提示。如果传递了一个不支持的角色的消息，将会抛出错误。

示例模型：

WizardLM/WizardLM-13B-V1.2

from easyllm.prompt_utils import build_wizardlm_promptmessages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain asynchronous programming in the style of the pirate Blackbeard."},]prompt = build_wizardlm_prompt(messages)7.5、StableBeluga2 Chat构建器

用于创建StableBeluga2聊天对话的提示。如果传递了一个不支持的角色的消息，将会抛出错误。

from easyllm.prompt_utils import build_stablebeluga_promptmessages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain asynchronous programming in the style of the pirate Blackbeard."},]prompt = build_stablebeluga_prompt(messages)7.6、Open Assistant Chat构建器

创建Open Assistant ChatML模板。使用、、和标记。如果传递了一个不支持的角色的消息，将会抛出错误。

示例模型：

OpenAssistant/llama2-13b-orca-8k-3319

from easyllm.prompt_utils import build_open_assistant_promptmessages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain asynchronous programming in the style of the pirate Blackbeard."},]prompt = build_open_assistant_prompt(messages)八、应用案例

以下是一些帮助您开始使用 easyllm 库的示例：

例子

描述

详细的聊天完成示例https://philschmid.github.io/easyllm/examples/chat-completion-api/

演示如何使用 ChatCompletion API 与模型进行对话式聊天

如何流式传输聊天请求的示例https://philschmid.github.io/easyllm/examples/stream-chat-completions/

演示流式传输多个聊天请求以与模型高效聊天。

如何传输文本请求的示例https://philschmid.github.io/easyllm/examples/stream-text-completions/

演示如何流式传输多个文本完成请求。

详细完成示例https://philschmid.github.io/easyllm/examples/text-completion-api/

使用 TextCompletion API 通过模型生成文本。

创建嵌入https://philschmid.github.io/easyllm/examples/get-embeddings/

使用模型将文本嵌入到矢量表示中。

拥抱脸部推理端点示例https://philschmid.github.io/easyllm/examples/inference-endpoints-example/

有关如何使用自定义端点（例如推理端点或本地主机）的示例

使用 Llama 2 检索增强生成https://philschmid.github.io/easyllm/examples/llama2-rag-example/

有关如何使用 Llama 2 70B 进行上下文检索增强的示例

Llama 2 70B 代理/工具使用示例https://philschmid.github.io/easyllm/examples/llama2-agent-example/

如何使用 Llama 2 70B 与工具交互并可用作代理的示例

这些示例涵盖了EasyLLM的主要功能 - 聊天、文本完成和嵌入。

九、ReferencsEasyLLM GitHubhttps://github.com/philschmid/easyllmLlama 2 Prompthttps://huggingface.co/blog/llama2#how-to-prompt-llama-2Vicuna Prompthttps://github.com/lm-sys/FastChat/blob/main/docs/vicuna_weights_version.md#prompt-templateStarChat Prompthttps://huggingface.co/HuggingFaceH4/starchat-betaWizardLM Prompthttps://github.com/nlpxucan/WizardLM/blob/main/WizardLM/src/infer_wizardlm13b.py#L79StableBeluga2https://huggingface.co/stabilityai/StableBeluga2

赏礼

赏钱

免责声明：本文仅代表作者个人观点，与本站无关。其原创性以及文中陈述文字和内容未经本网证实，对本文以及其中全部或者部分内容、文字的真实性、完整性、及时性本站不作任何保证或承诺，请读者仅作参考，并请自行核实相关内容。凡本网注明 “来源：XXX（非本站）”的作品，均转载自其它媒体，转载目的在于传递更多信息，并不代表本网赞同其观点和对其真实性负责。如因作品内容、版权和其它问题需要同本网联系的，请在一周内进行，以便我们及时处理。 QQ：617470285 邮箱：617470285@qq.com

微软发布最强生成式 AI 虚拟机扩大 Azure OpenAI 服务范围

10个月前

窃取个人数据？OpenAI遭集体诉讼！

10个月前