镜像自地址
https://github.com/binary-husky/gpt_academic.git
已同步 2025-12-07 23:16:48 +00:00
比较提交
25 次代码提交
version3.4
...
version3.4
| 作者 | SHA1 | 提交日期 | |
|---|---|---|---|
|
|
601a95c948 | ||
|
|
e18bef2e9c | ||
|
|
f654c1af31 | ||
|
|
e90048a671 | ||
|
|
ea624b1510 | ||
|
|
057e3dda3c | ||
|
|
280e14d7b7 | ||
|
|
9f0cf9fb2b | ||
|
|
b8560b7510 | ||
|
|
d841d13b04 | ||
|
|
efda9e5193 | ||
|
|
33d2e75aac | ||
|
|
74941170aa | ||
|
|
cd38949903 | ||
|
|
d87f1eb171 | ||
|
|
cd1e4e1ba7 | ||
|
|
cf5f348d70 | ||
|
|
f3e4e26e2f | ||
|
|
d5bab093f9 | ||
|
|
f94b167dc2 | ||
|
|
016d8ee156 | ||
|
|
dca9ec4bae | ||
|
|
7fdf0a8e51 | ||
|
|
9a5a509dd9 | ||
|
|
f3205994ea |
30
README.md
30
README.md
@@ -97,7 +97,7 @@ cd gpt_academic
|
|||||||
|
|
||||||
2. 配置API_KEY
|
2. 配置API_KEY
|
||||||
|
|
||||||
在`config.py`中,配置API KEY等设置,[特殊网络环境设置](https://github.com/binary-husky/gpt_academic/issues/1) 。
|
在`config.py`中,配置API KEY等设置,[点击查看特殊网络环境设置方法](https://github.com/binary-husky/gpt_academic/issues/1) 。
|
||||||
|
|
||||||
(P.S. 程序运行时会优先检查是否存在名为`config_private.py`的私密配置文件,并用其中的配置覆盖`config.py`的同名配置。因此,如果您能理解我们的配置读取逻辑,我们强烈建议您在`config.py`旁边创建一个名为`config_private.py`的新配置文件,并把`config.py`中的配置转移(复制)到`config_private.py`中。`config_private.py`不受git管控,可以让您的隐私信息更加安全。P.S.项目同样支持通过`环境变量`配置大多数选项,环境变量的书写格式参考`docker-compose`文件。读取优先级: `环境变量` > `config_private.py` > `config.py`)
|
(P.S. 程序运行时会优先检查是否存在名为`config_private.py`的私密配置文件,并用其中的配置覆盖`config.py`的同名配置。因此,如果您能理解我们的配置读取逻辑,我们强烈建议您在`config.py`旁边创建一个名为`config_private.py`的新配置文件,并把`config.py`中的配置转移(复制)到`config_private.py`中。`config_private.py`不受git管控,可以让您的隐私信息更加安全。P.S.项目同样支持通过`环境变量`配置大多数选项,环境变量的书写格式参考`docker-compose`文件。读取优先级: `环境变量` > `config_private.py` > `config.py`)
|
||||||
|
|
||||||
@@ -140,15 +140,9 @@ AVAIL_LLM_MODELS = ["gpt-3.5-turbo", "api2d-gpt-3.5-turbo", "gpt-4", "api2d-gpt-
|
|||||||
python main.py
|
python main.py
|
||||||
```
|
```
|
||||||
|
|
||||||
5. 测试函数插件
|
|
||||||
```
|
|
||||||
- 测试函数插件模板函数(要求gpt回答历史上的今天发生了什么),您可以根据此函数为模板,实现更复杂的功能
|
|
||||||
点击 "[函数插件模板Demo] 历史上的今天"
|
|
||||||
```
|
|
||||||
|
|
||||||
## 安装-方法2:使用Docker
|
## 安装-方法2:使用Docker
|
||||||
|
|
||||||
1. 仅ChatGPT(推荐大多数人选择)
|
1. 仅ChatGPT(推荐大多数人选择,等价于docker-compose方案1)
|
||||||
|
|
||||||
``` sh
|
``` sh
|
||||||
git clone https://github.com/binary-husky/gpt_academic.git # 下载项目
|
git clone https://github.com/binary-husky/gpt_academic.git # 下载项目
|
||||||
@@ -161,41 +155,43 @@ docker run --rm -it --net=host gpt-academic
|
|||||||
#(最后一步-选择2)在macOS/windows环境下,只能用-p选项将容器上的端口(例如50923)暴露给主机上的端口
|
#(最后一步-选择2)在macOS/windows环境下,只能用-p选项将容器上的端口(例如50923)暴露给主机上的端口
|
||||||
docker run --rm -it -e WEB_PORT=50923 -p 50923:50923 gpt-academic
|
docker run --rm -it -e WEB_PORT=50923 -p 50923:50923 gpt-academic
|
||||||
```
|
```
|
||||||
P.S. 如果需要依赖Latex的插件功能,请见Wiki
|
P.S. 如果需要依赖Latex的插件功能,请见Wiki。另外,您也可以直接使用docker-compose获取Latex功能(修改docker-compose.yml,保留方案4并删除其他方案)。
|
||||||
|
|
||||||
2. ChatGPT + ChatGLM + MOSS(需要熟悉Docker)
|
2. ChatGPT + ChatGLM + MOSS(需要熟悉Docker)
|
||||||
|
|
||||||
``` sh
|
``` sh
|
||||||
# 修改docker-compose.yml,删除方案1和方案3,保留方案2。修改docker-compose.yml中方案2的配置,参考其中注释即可
|
# 修改docker-compose.yml,保留方案2并删除其他方案。修改docker-compose.yml中方案2的配置,参考其中注释即可
|
||||||
docker-compose up
|
docker-compose up
|
||||||
```
|
```
|
||||||
|
|
||||||
3. ChatGPT + LLAMA + 盘古 + RWKV(需要熟悉Docker)
|
3. ChatGPT + LLAMA + 盘古 + RWKV(需要熟悉Docker)
|
||||||
``` sh
|
``` sh
|
||||||
# 修改docker-compose.yml,删除方案1和方案2,保留方案3。修改docker-compose.yml中方案3的配置,参考其中注释即可
|
# 修改docker-compose.yml,保留方案3并删除其他方案。修改docker-compose.yml中方案3的配置,参考其中注释即可
|
||||||
docker-compose up
|
docker-compose up
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
## 安装-方法3:其他部署姿势
|
## 安装-方法3:其他部署姿势
|
||||||
1. 一键运行脚本。
|
1. 一键运行脚本。
|
||||||
完全不熟悉python环境的Windows用户可以下载[Release](https://github.com/binary-husky/gpt_academic/releases)中发布的一键运行脚本安装无本地模型的版本,
|
完全不熟悉python环境的Windows用户可以下载[Release](https://github.com/binary-husky/gpt_academic/releases)中发布的一键运行脚本安装无本地模型的版本。
|
||||||
不建议电脑上已有python的用户采用此方法(在此基础上安装插件的依赖很麻烦)。
|
|
||||||
脚本的贡献来源是[oobabooga](https://github.com/oobabooga/one-click-installers)。
|
脚本的贡献来源是[oobabooga](https://github.com/oobabooga/one-click-installers)。
|
||||||
|
|
||||||
2. 使用docker-compose运行。
|
2. 使用docker-compose运行。
|
||||||
请阅读docker-compose.yml后,按照其中的提示操作即可
|
请阅读docker-compose.yml后,按照其中的提示操作即可
|
||||||
|
|
||||||
3. 如何使用反代URL/微软云AzureAPI。
|
3. 如何使用反代URL
|
||||||
按照`config.py`中的说明配置API_URL_REDIRECT即可。
|
按照`config.py`中的说明配置API_URL_REDIRECT即可。
|
||||||
|
|
||||||
4. 远程云服务器部署(需要云服务器知识与经验)。
|
4. 微软云AzureAPI
|
||||||
|
按照`config.py`中的说明配置即可(AZURE_ENDPOINT等四个配置)
|
||||||
|
|
||||||
|
5. 远程云服务器部署(需要云服务器知识与经验)。
|
||||||
请访问[部署wiki-1](https://github.com/binary-husky/gpt_academic/wiki/%E4%BA%91%E6%9C%8D%E5%8A%A1%E5%99%A8%E8%BF%9C%E7%A8%8B%E9%83%A8%E7%BD%B2%E6%8C%87%E5%8D%97)
|
请访问[部署wiki-1](https://github.com/binary-husky/gpt_academic/wiki/%E4%BA%91%E6%9C%8D%E5%8A%A1%E5%99%A8%E8%BF%9C%E7%A8%8B%E9%83%A8%E7%BD%B2%E6%8C%87%E5%8D%97)
|
||||||
|
|
||||||
5. 使用WSL2(Windows Subsystem for Linux 子系统)。
|
6. 使用WSL2(Windows Subsystem for Linux 子系统)。
|
||||||
请访问[部署wiki-2](https://github.com/binary-husky/gpt_academic/wiki/%E4%BD%BF%E7%94%A8WSL2%EF%BC%88Windows-Subsystem-for-Linux-%E5%AD%90%E7%B3%BB%E7%BB%9F%EF%BC%89%E9%83%A8%E7%BD%B2)
|
请访问[部署wiki-2](https://github.com/binary-husky/gpt_academic/wiki/%E4%BD%BF%E7%94%A8WSL2%EF%BC%88Windows-Subsystem-for-Linux-%E5%AD%90%E7%B3%BB%E7%BB%9F%EF%BC%89%E9%83%A8%E7%BD%B2)
|
||||||
|
|
||||||
6. 如何在二级网址(如`http://localhost/subpath`)下运行。
|
7. 如何在二级网址(如`http://localhost/subpath`)下运行。
|
||||||
请访问[FastAPI运行说明](docs/WithFastapi.md)
|
请访问[FastAPI运行说明](docs/WithFastapi.md)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|||||||
12
config.py
12
config.py
@@ -1,6 +1,7 @@
|
|||||||
# [step 1]>> 例如: API_KEY = "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" (此key无效)
|
# [step 1]>> 例如: API_KEY = "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" (此key无效)
|
||||||
API_KEY = "sk-此处填API密钥" # 可同时填写多个API-KEY,用英文逗号分割,例如API_KEY = "sk-openaikey1,sk-openaikey2,fkxxxx-api2dkey1,fkxxxx-api2dkey2"
|
API_KEY = "sk-此处填API密钥" # 可同时填写多个API-KEY,用英文逗号分割,例如API_KEY = "sk-openaikey1,sk-openaikey2,fkxxxx-api2dkey1,fkxxxx-api2dkey2"
|
||||||
|
|
||||||
|
|
||||||
# [step 2]>> 改为True应用代理,如果直接在海外服务器部署,此处不修改
|
# [step 2]>> 改为True应用代理,如果直接在海外服务器部署,此处不修改
|
||||||
USE_PROXY = False
|
USE_PROXY = False
|
||||||
if USE_PROXY:
|
if USE_PROXY:
|
||||||
@@ -46,8 +47,8 @@ MAX_RETRY = 2
|
|||||||
|
|
||||||
# 模型选择是 (注意: LLM_MODEL是默认选中的模型, 同时它必须被包含在AVAIL_LLM_MODELS切换列表中 )
|
# 模型选择是 (注意: LLM_MODEL是默认选中的模型, 同时它必须被包含在AVAIL_LLM_MODELS切换列表中 )
|
||||||
LLM_MODEL = "gpt-3.5-turbo" # 可选 ↓↓↓
|
LLM_MODEL = "gpt-3.5-turbo" # 可选 ↓↓↓
|
||||||
AVAIL_LLM_MODELS = ["gpt-3.5-turbo-16k", "gpt-3.5-turbo", "api2d-gpt-3.5-turbo", "gpt-4", "api2d-gpt-4", "chatglm", "moss", "newbing", "newbing-free", "stack-claude"]
|
AVAIL_LLM_MODELS = ["gpt-3.5-turbo-16k", "gpt-3.5-turbo", "azure-gpt35", "api2d-gpt-3.5-turbo", "gpt-4", "api2d-gpt-4", "chatglm", "moss", "newbing", "newbing-free", "stack-claude"]
|
||||||
# P.S. 其他可用的模型还包括 ["newbing-free", "jittorllms_rwkv", "jittorllms_pangualpha", "jittorllms_llama"]
|
# P.S. 其他可用的模型还包括 ["gpt-3.5-turbo-0613", "gpt-3.5-turbo-16k-0613", "newbing-free", "jittorllms_rwkv", "jittorllms_pangualpha", "jittorllms_llama"]
|
||||||
|
|
||||||
# 本地LLM模型如ChatGLM的执行方式 CPU/GPU
|
# 本地LLM模型如ChatGLM的执行方式 CPU/GPU
|
||||||
LOCAL_MODEL_DEVICE = "cpu" # 可选 "cuda"
|
LOCAL_MODEL_DEVICE = "cpu" # 可选 "cuda"
|
||||||
@@ -81,3 +82,10 @@ your bing cookies here
|
|||||||
# 如果需要使用Slack Claude,使用教程详情见 request_llm/README.md
|
# 如果需要使用Slack Claude,使用教程详情见 request_llm/README.md
|
||||||
SLACK_CLAUDE_BOT_ID = ''
|
SLACK_CLAUDE_BOT_ID = ''
|
||||||
SLACK_CLAUDE_USER_TOKEN = ''
|
SLACK_CLAUDE_USER_TOKEN = ''
|
||||||
|
|
||||||
|
|
||||||
|
# 如果需要使用AZURE 详情请见额外文档 docs\use_azure.md
|
||||||
|
AZURE_ENDPOINT = "https://你的api名称.openai.azure.com/"
|
||||||
|
AZURE_API_KEY = "填入azure openai api的密钥"
|
||||||
|
AZURE_API_VERSION = "填入api版本"
|
||||||
|
AZURE_ENGINE = "填入ENGINE"
|
||||||
|
|||||||
@@ -188,7 +188,15 @@ def test_Latex():
|
|||||||
# txt = r"https://arxiv.org/abs/2305.17608"
|
# txt = r"https://arxiv.org/abs/2305.17608"
|
||||||
# txt = r"https://arxiv.org/abs/2211.16068" # ACE
|
# txt = r"https://arxiv.org/abs/2211.16068" # ACE
|
||||||
# txt = r"C:\Users\x\arxiv_cache\2211.16068\workfolder" # ACE
|
# txt = r"C:\Users\x\arxiv_cache\2211.16068\workfolder" # ACE
|
||||||
txt = r"https://arxiv.org/abs/2002.09253"
|
# txt = r"https://arxiv.org/abs/2002.09253"
|
||||||
|
# txt = r"https://arxiv.org/abs/2306.07831"
|
||||||
|
txt = r"https://arxiv.org/abs/2212.10156"
|
||||||
|
# txt = r"https://arxiv.org/abs/2211.11559"
|
||||||
|
# txt = r"https://arxiv.org/abs/2303.08774"
|
||||||
|
# txt = r"https://arxiv.org/abs/2303.12712"
|
||||||
|
# txt = r"C:\Users\fuqingxu\arxiv_cache\2303.12712\workfolder"
|
||||||
|
|
||||||
|
|
||||||
for cookies, cb, hist, msg in (Latex翻译中文并重新编译PDF)(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
|
for cookies, cb, hist, msg in (Latex翻译中文并重新编译PDF)(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
|
||||||
cli_printer.print(cb) # print(cb)
|
cli_printer.print(cb) # print(cb)
|
||||||
|
|
||||||
@@ -217,6 +225,7 @@ def test_Latex():
|
|||||||
# test_数学动画生成manim()
|
# test_数学动画生成manim()
|
||||||
# test_Langchain知识库()
|
# test_Langchain知识库()
|
||||||
# test_Langchain知识库读取()
|
# test_Langchain知识库读取()
|
||||||
test_Latex()
|
if __name__ == "__main__":
|
||||||
input("程序完成,回车退出。")
|
test_Latex()
|
||||||
print("退出。")
|
input("程序完成,回车退出。")
|
||||||
|
print("退出。")
|
||||||
@@ -8,24 +8,31 @@ pj = os.path.join
|
|||||||
"""
|
"""
|
||||||
========================================================================
|
========================================================================
|
||||||
Part One
|
Part One
|
||||||
Latex segmentation to a linklist
|
Latex segmentation with a binary mask (PRESERVE=0, TRANSFORM=1)
|
||||||
========================================================================
|
========================================================================
|
||||||
"""
|
"""
|
||||||
PRESERVE = 0
|
PRESERVE = 0
|
||||||
TRANSFORM = 1
|
TRANSFORM = 1
|
||||||
|
|
||||||
def split_worker(text, mask, pattern, flags=0):
|
def set_forbidden_text(text, mask, pattern, flags=0):
|
||||||
"""
|
"""
|
||||||
Add a preserve text area in this paper
|
Add a preserve text area in this paper
|
||||||
|
e.g. with pattern = r"\\begin\{algorithm\}(.*?)\\end\{algorithm\}"
|
||||||
|
you can mask out (mask = PRESERVE so that text become untouchable for GPT)
|
||||||
|
everything between "\begin{equation}" and "\end{equation}"
|
||||||
"""
|
"""
|
||||||
|
if isinstance(pattern, list): pattern = '|'.join(pattern)
|
||||||
pattern_compile = re.compile(pattern, flags)
|
pattern_compile = re.compile(pattern, flags)
|
||||||
for res in pattern_compile.finditer(text):
|
for res in pattern_compile.finditer(text):
|
||||||
mask[res.span()[0]:res.span()[1]] = PRESERVE
|
mask[res.span()[0]:res.span()[1]] = PRESERVE
|
||||||
return text, mask
|
return text, mask
|
||||||
|
|
||||||
def split_worker_careful_brace(text, mask, pattern, flags=0):
|
def set_forbidden_text_careful_brace(text, mask, pattern, flags=0):
|
||||||
"""
|
"""
|
||||||
Move area into preserve area
|
Add a preserve text area in this paper (text become untouchable for GPT).
|
||||||
|
count the number of the braces so as to catch compelete text area.
|
||||||
|
e.g.
|
||||||
|
\caption{blablablablabla\texbf{blablabla}blablabla.}
|
||||||
"""
|
"""
|
||||||
pattern_compile = re.compile(pattern, flags)
|
pattern_compile = re.compile(pattern, flags)
|
||||||
for res in pattern_compile.finditer(text):
|
for res in pattern_compile.finditer(text):
|
||||||
@@ -40,9 +47,12 @@ def split_worker_careful_brace(text, mask, pattern, flags=0):
|
|||||||
mask[begin:end] = PRESERVE
|
mask[begin:end] = PRESERVE
|
||||||
return text, mask
|
return text, mask
|
||||||
|
|
||||||
def split_worker_reverse_careful_brace(text, mask, pattern, flags=0):
|
def reverse_forbidden_text_careful_brace(text, mask, pattern, flags=0, forbid_wrapper=True):
|
||||||
"""
|
"""
|
||||||
Move area out of preserve area
|
Move area out of preserve area (make text editable for GPT)
|
||||||
|
count the number of the braces so as to catch compelete text area.
|
||||||
|
e.g.
|
||||||
|
\caption{blablablablabla\texbf{blablabla}blablabla.}
|
||||||
"""
|
"""
|
||||||
pattern_compile = re.compile(pattern, flags)
|
pattern_compile = re.compile(pattern, flags)
|
||||||
for res in pattern_compile.finditer(text):
|
for res in pattern_compile.finditer(text):
|
||||||
@@ -55,9 +65,12 @@ def split_worker_reverse_careful_brace(text, mask, pattern, flags=0):
|
|||||||
p += 1
|
p += 1
|
||||||
end = p
|
end = p
|
||||||
mask[begin:end] = TRANSFORM
|
mask[begin:end] = TRANSFORM
|
||||||
|
if forbid_wrapper:
|
||||||
|
mask[res.regs[0][0]:begin] = PRESERVE
|
||||||
|
mask[end:res.regs[0][1]] = PRESERVE
|
||||||
return text, mask
|
return text, mask
|
||||||
|
|
||||||
def split_worker_begin_end(text, mask, pattern, flags=0, limit_n_lines=42):
|
def set_forbidden_text_begin_end(text, mask, pattern, flags=0, limit_n_lines=42):
|
||||||
"""
|
"""
|
||||||
Find all \begin{} ... \end{} text block that with less than limit_n_lines lines.
|
Find all \begin{} ... \end{} text block that with less than limit_n_lines lines.
|
||||||
Add it to preserve area
|
Add it to preserve area
|
||||||
@@ -110,19 +123,41 @@ Latex Merge File
|
|||||||
def 寻找Latex主文件(file_manifest, mode):
|
def 寻找Latex主文件(file_manifest, mode):
|
||||||
"""
|
"""
|
||||||
在多Tex文档中,寻找主文件,必须包含documentclass,返回找到的第一个。
|
在多Tex文档中,寻找主文件,必须包含documentclass,返回找到的第一个。
|
||||||
P.S. 但愿没人把latex模板放在里面传进来
|
P.S. 但愿没人把latex模板放在里面传进来 (6.25 加入判定latex模板的代码)
|
||||||
"""
|
"""
|
||||||
|
canidates = []
|
||||||
for texf in file_manifest:
|
for texf in file_manifest:
|
||||||
if os.path.basename(texf).startswith('merge'):
|
if os.path.basename(texf).startswith('merge'):
|
||||||
continue
|
continue
|
||||||
with open(texf, 'r', encoding='utf8') as f:
|
with open(texf, 'r', encoding='utf8') as f:
|
||||||
file_content = f.read()
|
file_content = f.read()
|
||||||
if r'\documentclass' in file_content:
|
if r'\documentclass' in file_content:
|
||||||
return texf
|
canidates.append(texf)
|
||||||
else:
|
else:
|
||||||
continue
|
continue
|
||||||
raise RuntimeError('无法找到一个主Tex文件(包含documentclass关键字)')
|
|
||||||
|
|
||||||
|
if len(canidates) == 0:
|
||||||
|
raise RuntimeError('无法找到一个主Tex文件(包含documentclass关键字)')
|
||||||
|
elif len(canidates) == 1:
|
||||||
|
return canidates[0]
|
||||||
|
else: # if len(canidates) >= 2 通过一些Latex模板中常见(但通常不会出现在正文)的单词,对不同latex源文件扣分,取评分最高者返回
|
||||||
|
canidates_score = []
|
||||||
|
# 给出一些判定模板文档的词作为扣分项
|
||||||
|
unexpected_words = ['\LaTeX', 'manuscript', 'Guidelines', 'font', 'citations', 'rejected', 'blind review', 'reviewers']
|
||||||
|
expected_words = ['\input', '\ref', '\cite']
|
||||||
|
for texf in canidates:
|
||||||
|
canidates_score.append(0)
|
||||||
|
with open(texf, 'r', encoding='utf8') as f:
|
||||||
|
file_content = f.read()
|
||||||
|
for uw in unexpected_words:
|
||||||
|
if uw in file_content:
|
||||||
|
canidates_score[-1] -= 1
|
||||||
|
for uw in expected_words:
|
||||||
|
if uw in file_content:
|
||||||
|
canidates_score[-1] += 1
|
||||||
|
select = np.argmax(canidates_score) # 取评分最高者返回
|
||||||
|
return canidates[select]
|
||||||
|
|
||||||
def rm_comments(main_file):
|
def rm_comments(main_file):
|
||||||
new_file_remove_comment_lines = []
|
new_file_remove_comment_lines = []
|
||||||
for l in main_file.splitlines():
|
for l in main_file.splitlines():
|
||||||
@@ -132,6 +167,7 @@ def rm_comments(main_file):
|
|||||||
else:
|
else:
|
||||||
new_file_remove_comment_lines.append(l)
|
new_file_remove_comment_lines.append(l)
|
||||||
main_file = '\n'.join(new_file_remove_comment_lines)
|
main_file = '\n'.join(new_file_remove_comment_lines)
|
||||||
|
# main_file = re.sub(r"\\include{(.*?)}", r"\\input{\1}", main_file) # 将 \include 命令转换为 \input 命令
|
||||||
main_file = re.sub(r'(?<!\\)%.*', '', main_file) # 使用正则表达式查找半行注释, 并替换为空字符串
|
main_file = re.sub(r'(?<!\\)%.*', '', main_file) # 使用正则表达式查找半行注释, 并替换为空字符串
|
||||||
return main_file
|
return main_file
|
||||||
|
|
||||||
@@ -178,9 +214,11 @@ def merge_tex_files(project_foler, main_file, mode):
|
|||||||
main_file = re.sub(r"\\documentclass\[(.*?)\]{(.*?)}", r"\\documentclass[\1,fontset=windows,UTF8]{\2}",main_file)
|
main_file = re.sub(r"\\documentclass\[(.*?)\]{(.*?)}", r"\\documentclass[\1,fontset=windows,UTF8]{\2}",main_file)
|
||||||
main_file = re.sub(r"\\documentclass{(.*?)}", r"\\documentclass[fontset=windows,UTF8]{\1}",main_file)
|
main_file = re.sub(r"\\documentclass{(.*?)}", r"\\documentclass[fontset=windows,UTF8]{\1}",main_file)
|
||||||
# find paper abstract
|
# find paper abstract
|
||||||
pattern = re.compile(r'\\begin\{abstract\}.*\n')
|
pattern_opt1 = re.compile(r'\\begin\{abstract\}.*\n')
|
||||||
match = pattern.search(main_file)
|
pattern_opt2 = re.compile(r"\\abstract\{(.*?)\}", flags=re.DOTALL)
|
||||||
assert match is not None, "Cannot find paper abstract section!"
|
match_opt1 = pattern_opt1.search(main_file)
|
||||||
|
match_opt2 = pattern_opt2.search(main_file)
|
||||||
|
assert (match_opt1 is not None) or (match_opt2 is not None), "Cannot find paper abstract section!"
|
||||||
return main_file
|
return main_file
|
||||||
|
|
||||||
|
|
||||||
@@ -212,6 +250,8 @@ def fix_content(final_tex, node_string):
|
|||||||
final_tex = re.sub(r"\\\ ([a-z]{2,10})\{", r"\\\1{", string=final_tex)
|
final_tex = re.sub(r"\\\ ([a-z]{2,10})\{", r"\\\1{", string=final_tex)
|
||||||
final_tex = re.sub(r"\\([a-z]{2,10})\{([^\}]*?)\}", mod_inbraket, string=final_tex)
|
final_tex = re.sub(r"\\([a-z]{2,10})\{([^\}]*?)\}", mod_inbraket, string=final_tex)
|
||||||
|
|
||||||
|
if "Traceback" in final_tex and "[Local Message]" in final_tex:
|
||||||
|
final_tex = node_string # 出问题了,还原原文
|
||||||
if node_string.count('\\begin') != final_tex.count('\\begin'):
|
if node_string.count('\\begin') != final_tex.count('\\begin'):
|
||||||
final_tex = node_string # 出问题了,还原原文
|
final_tex = node_string # 出问题了,还原原文
|
||||||
if node_string.count('\_') > 0 and node_string.count('\_') > final_tex.count('\_'):
|
if node_string.count('\_') > 0 and node_string.count('\_') > final_tex.count('\_'):
|
||||||
@@ -259,45 +299,33 @@ def split_subprocess(txt, project_folder, return_dict, opts):
|
|||||||
mask = np.zeros(len(txt), dtype=np.uint8) + TRANSFORM
|
mask = np.zeros(len(txt), dtype=np.uint8) + TRANSFORM
|
||||||
|
|
||||||
# 吸收title与作者以上的部分
|
# 吸收title与作者以上的部分
|
||||||
text, mask = split_worker(text, mask, r"(.*?)\\maketitle", re.DOTALL)
|
text, mask = set_forbidden_text(text, mask, r"(.*?)\\maketitle", re.DOTALL)
|
||||||
# 删除iffalse注释
|
# 吸收iffalse注释
|
||||||
text, mask = split_worker(text, mask, r"\\iffalse(.*?)\\fi", re.DOTALL)
|
text, mask = set_forbidden_text(text, mask, r"\\iffalse(.*?)\\fi", re.DOTALL)
|
||||||
# 吸收在25行以内的begin-end组合
|
# 吸收在42行以内的begin-end组合
|
||||||
text, mask = split_worker_begin_end(text, mask, r"\\begin\{([a-z\*]*)\}(.*?)\\end\{\1\}", re.DOTALL, limit_n_lines=25)
|
text, mask = set_forbidden_text_begin_end(text, mask, r"\\begin\{([a-z\*]*)\}(.*?)\\end\{\1\}", re.DOTALL, limit_n_lines=42)
|
||||||
# 吸收匿名公式
|
# 吸收匿名公式
|
||||||
text, mask = split_worker(text, mask, r"\$\$(.*?)\$\$", re.DOTALL)
|
text, mask = set_forbidden_text(text, mask, [ r"\$\$(.*?)\$\$", r"\\\[.*?\\\]" ], re.DOTALL)
|
||||||
# 吸收其他杂项
|
# 吸收其他杂项
|
||||||
text, mask = split_worker(text, mask, r"\\section\{(.*?)\}")
|
text, mask = set_forbidden_text(text, mask, [ r"\\section\{(.*?)\}", r"\\section\*\{(.*?)\}", r"\\subsection\{(.*?)\}", r"\\subsubsection\{(.*?)\}" ])
|
||||||
text, mask = split_worker(text, mask, r"\\section\*\{(.*?)\}")
|
text, mask = set_forbidden_text(text, mask, [ r"\\bibliography\{(.*?)\}", r"\\bibliographystyle\{(.*?)\}" ])
|
||||||
text, mask = split_worker(text, mask, r"\\subsection\{(.*?)\}")
|
text, mask = set_forbidden_text(text, mask, r"\\begin\{thebibliography\}.*?\\end\{thebibliography\}", re.DOTALL)
|
||||||
text, mask = split_worker(text, mask, r"\\subsubsection\{(.*?)\}")
|
text, mask = set_forbidden_text(text, mask, r"\\begin\{lstlisting\}(.*?)\\end\{lstlisting\}", re.DOTALL)
|
||||||
text, mask = split_worker(text, mask, r"\\bibliography\{(.*?)\}")
|
text, mask = set_forbidden_text(text, mask, r"\\begin\{wraptable\}(.*?)\\end\{wraptable\}", re.DOTALL)
|
||||||
text, mask = split_worker(text, mask, r"\\bibliographystyle\{(.*?)\}")
|
text, mask = set_forbidden_text(text, mask, r"\\begin\{algorithm\}(.*?)\\end\{algorithm\}", re.DOTALL)
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{lstlisting\}(.*?)\\end\{lstlisting\}", re.DOTALL)
|
text, mask = set_forbidden_text(text, mask, [r"\\begin\{wrapfigure\}(.*?)\\end\{wrapfigure\}", r"\\begin\{wrapfigure\*\}(.*?)\\end\{wrapfigure\*\}"], re.DOTALL)
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{wraptable\}(.*?)\\end\{wraptable\}", re.DOTALL)
|
text, mask = set_forbidden_text(text, mask, [r"\\begin\{figure\}(.*?)\\end\{figure\}", r"\\begin\{figure\*\}(.*?)\\end\{figure\*\}"], re.DOTALL)
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{algorithm\}(.*?)\\end\{algorithm\}", re.DOTALL)
|
text, mask = set_forbidden_text(text, mask, [r"\\begin\{multline\}(.*?)\\end\{multline\}", r"\\begin\{multline\*\}(.*?)\\end\{multline\*\}"], re.DOTALL)
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{wrapfigure\}(.*?)\\end\{wrapfigure\}", re.DOTALL)
|
text, mask = set_forbidden_text(text, mask, [r"\\begin\{table\}(.*?)\\end\{table\}", r"\\begin\{table\*\}(.*?)\\end\{table\*\}"], re.DOTALL)
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{wrapfigure\*\}(.*?)\\end\{wrapfigure\*\}", re.DOTALL)
|
text, mask = set_forbidden_text(text, mask, [r"\\begin\{minipage\}(.*?)\\end\{minipage\}", r"\\begin\{minipage\*\}(.*?)\\end\{minipage\*\}"], re.DOTALL)
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{figure\}(.*?)\\end\{figure\}", re.DOTALL)
|
text, mask = set_forbidden_text(text, mask, [r"\\begin\{align\*\}(.*?)\\end\{align\*\}", r"\\begin\{align\}(.*?)\\end\{align\}"], re.DOTALL)
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{figure\*\}(.*?)\\end\{figure\*\}", re.DOTALL)
|
text, mask = set_forbidden_text(text, mask, [r"\\begin\{equation\}(.*?)\\end\{equation\}", r"\\begin\{equation\*\}(.*?)\\end\{equation\*\}"], re.DOTALL)
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{multline\}(.*?)\\end\{multline\}", re.DOTALL)
|
text, mask = set_forbidden_text(text, mask, [r"\\includepdf\[(.*?)\]\{(.*?)\}", r"\\clearpage", r"\\newpage", r"\\appendix", r"\\tableofcontents", r"\\include\{(.*?)\}"])
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{multline\*\}(.*?)\\end\{multline\*\}", re.DOTALL)
|
text, mask = set_forbidden_text(text, mask, [r"\\vspace\{(.*?)\}", r"\\hspace\{(.*?)\}", r"\\label\{(.*?)\}", r"\\begin\{(.*?)\}", r"\\end\{(.*?)\}", r"\\item "])
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{table\}(.*?)\\end\{table\}", re.DOTALL)
|
text, mask = set_forbidden_text_careful_brace(text, mask, r"\\hl\{(.*?)\}", re.DOTALL)
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{table\*\}(.*?)\\end\{table\*\}", re.DOTALL)
|
# reverse 操作必须放在最后
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{minipage\}(.*?)\\end\{minipage\}", re.DOTALL)
|
text, mask = reverse_forbidden_text_careful_brace(text, mask, r"\\caption\{(.*?)\}", re.DOTALL, forbid_wrapper=True)
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{minipage\*\}(.*?)\\end\{minipage\*\}", re.DOTALL)
|
text, mask = reverse_forbidden_text_careful_brace(text, mask, r"\\abstract\{(.*?)\}", re.DOTALL, forbid_wrapper=True)
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{align\*\}(.*?)\\end\{align\*\}", re.DOTALL)
|
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{align\}(.*?)\\end\{align\}", re.DOTALL)
|
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{equation\}(.*?)\\end\{equation\}", re.DOTALL)
|
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{equation\*\}(.*?)\\end\{equation\*\}", re.DOTALL)
|
|
||||||
text, mask = split_worker(text, mask, r"\\item ")
|
|
||||||
text, mask = split_worker(text, mask, r"\\label\{(.*?)\}")
|
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{(.*?)\}")
|
|
||||||
text, mask = split_worker(text, mask, r"\\vspace\{(.*?)\}")
|
|
||||||
text, mask = split_worker(text, mask, r"\\hspace\{(.*?)\}")
|
|
||||||
text, mask = split_worker(text, mask, r"\\end\{(.*?)\}")
|
|
||||||
text, mask = split_worker_careful_brace(text, mask, r"\\hl\{(.*?)\}", re.DOTALL)
|
|
||||||
text, mask = split_worker_reverse_careful_brace(text, mask, r"\\caption\{(.*?)\}", re.DOTALL)
|
|
||||||
root = convert_to_linklist(text, mask)
|
root = convert_to_linklist(text, mask)
|
||||||
|
|
||||||
# 修复括号
|
# 修复括号
|
||||||
@@ -371,7 +399,7 @@ def split_subprocess(txt, project_folder, return_dict, opts):
|
|||||||
prev_node = node
|
prev_node = node
|
||||||
node = node.next
|
node = node.next
|
||||||
if node is None: break
|
if node is None: break
|
||||||
|
# 输出html调试文件,用红色标注处保留区(PRESERVE),用黑色标注转换区(TRANSFORM)
|
||||||
with open(pj(project_folder, 'debug_log.html'), 'w', encoding='utf8') as f:
|
with open(pj(project_folder, 'debug_log.html'), 'w', encoding='utf8') as f:
|
||||||
segment_parts_for_gpt = []
|
segment_parts_for_gpt = []
|
||||||
nodes = []
|
nodes = []
|
||||||
@@ -423,8 +451,14 @@ class LatexPaperSplit():
|
|||||||
if mode == 'translate_zh':
|
if mode == 'translate_zh':
|
||||||
pattern = re.compile(r'\\begin\{abstract\}.*\n')
|
pattern = re.compile(r'\\begin\{abstract\}.*\n')
|
||||||
match = pattern.search(result_string)
|
match = pattern.search(result_string)
|
||||||
assert match is not None, "Cannot find paper abstract section!"
|
if not match:
|
||||||
position = match.end()
|
# match \abstract{xxxx}
|
||||||
|
pattern_compile = re.compile(r"\\abstract\{(.*?)\}", flags=re.DOTALL)
|
||||||
|
match = pattern_compile.search(result_string)
|
||||||
|
position = match.regs[1][0]
|
||||||
|
else:
|
||||||
|
# match \begin{abstract}xxxx\end{abstract}
|
||||||
|
position = match.end()
|
||||||
result_string = result_string[:position] + self.msg + msg + self.msg_declare + result_string[position:]
|
result_string = result_string[:position] + self.msg + msg + self.msg_declare + result_string[position:]
|
||||||
return result_string
|
return result_string
|
||||||
|
|
||||||
@@ -443,6 +477,7 @@ class LatexPaperSplit():
|
|||||||
args=(txt, project_folder, return_dict, opts))
|
args=(txt, project_folder, return_dict, opts))
|
||||||
p.start()
|
p.start()
|
||||||
p.join()
|
p.join()
|
||||||
|
p.close()
|
||||||
self.nodes = return_dict['nodes']
|
self.nodes = return_dict['nodes']
|
||||||
self.sp = return_dict['segment_parts_for_gpt']
|
self.sp = return_dict['segment_parts_for_gpt']
|
||||||
return self.sp
|
return self.sp
|
||||||
|
|||||||
@@ -103,3 +103,30 @@ services:
|
|||||||
echo '[jittorllms] 正在从github拉取最新代码...' &&
|
echo '[jittorllms] 正在从github拉取最新代码...' &&
|
||||||
git --git-dir=request_llm/jittorllms/.git --work-tree=request_llm/jittorllms pull --force &&
|
git --git-dir=request_llm/jittorllms/.git --work-tree=request_llm/jittorllms pull --force &&
|
||||||
python3 -u main.py"
|
python3 -u main.py"
|
||||||
|
|
||||||
|
|
||||||
|
## ===================================================
|
||||||
|
## 【方案四】 chatgpt + Latex
|
||||||
|
## ===================================================
|
||||||
|
version: '3'
|
||||||
|
services:
|
||||||
|
gpt_academic_with_latex:
|
||||||
|
image: ghcr.io/binary-husky/gpt_academic_with_latex:master
|
||||||
|
environment:
|
||||||
|
# 请查阅 `config.py` 以查看所有的配置信息
|
||||||
|
API_KEY: ' sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx '
|
||||||
|
USE_PROXY: ' True '
|
||||||
|
proxies: ' { "http": "socks5h://localhost:10880", "https": "socks5h://localhost:10880", } '
|
||||||
|
LLM_MODEL: ' gpt-3.5-turbo '
|
||||||
|
AVAIL_LLM_MODELS: ' ["gpt-3.5-turbo", "gpt-4"] '
|
||||||
|
LOCAL_MODEL_DEVICE: ' cuda '
|
||||||
|
DEFAULT_WORKER_NUM: ' 10 '
|
||||||
|
WEB_PORT: ' 12303 '
|
||||||
|
|
||||||
|
# 与宿主的网络融合
|
||||||
|
network_mode: "host"
|
||||||
|
|
||||||
|
# 不使用代理网络拉取最新代码
|
||||||
|
command: >
|
||||||
|
bash -c "python3 -u main.py"
|
||||||
|
|
||||||
|
|||||||
152
docs/use_azure.md
普通文件
152
docs/use_azure.md
普通文件
@@ -0,0 +1,152 @@
|
|||||||
|
# 通过微软Azure云服务申请 Openai API
|
||||||
|
|
||||||
|
由于Openai和微软的关系,现在是可以通过微软的Azure云计算服务直接访问openai的api,免去了注册和网络的问题。
|
||||||
|
|
||||||
|
快速入门的官方文档的链接是:[快速入门 - 开始通过 Azure OpenAI 服务使用 ChatGPT 和 GPT-4 - Azure OpenAI Service | Microsoft Learn](https://learn.microsoft.com/zh-cn/azure/cognitive-services/openai/chatgpt-quickstart?pivots=programming-language-python)
|
||||||
|
|
||||||
|
# 申请API
|
||||||
|
|
||||||
|
按文档中的“先决条件”的介绍,出了编程的环境以外,还需要以下三个条件:
|
||||||
|
|
||||||
|
1. Azure账号并创建订阅
|
||||||
|
|
||||||
|
2. 为订阅添加Azure OpenAI 服务
|
||||||
|
|
||||||
|
3. 部署模型
|
||||||
|
|
||||||
|
## Azure账号并创建订阅
|
||||||
|
|
||||||
|
### Azure账号
|
||||||
|
|
||||||
|
创建Azure的账号时最好是有微软的账号,这样似乎更容易获得免费额度(第一个月的200美元,实测了一下,如果用一个刚注册的微软账号登录Azure的话,并没有这一个月的免费额度)。
|
||||||
|
|
||||||
|
创建Azure账号的网址是:[立即创建 Azure 免费帐户 | Microsoft Azure](https://azure.microsoft.com/zh-cn/free/)
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
打开网页后,点击 “免费开始使用” 会跳转到登录或注册页面,如果有微软的账户,直接登录即可,如果没有微软账户,那就需要到微软的网页再另行注册一个。
|
||||||
|
|
||||||
|
注意,Azure的页面和政策时不时会变化,已实际最新显示的为准就好。
|
||||||
|
|
||||||
|
### 创建订阅
|
||||||
|
|
||||||
|
注册好Azure后便可进入主页:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
首先需要在订阅里进行添加操作,点开后即可进入订阅的页面:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
第一次进来应该是空的,点添加即可创建新的订阅(可以是“免费”或者“即付即用”的订阅),其中订阅ID是后面申请Azure OpenAI需要使用的。
|
||||||
|
|
||||||
|
## 为订阅添加Azure OpenAI服务
|
||||||
|
|
||||||
|
之后回到首页,点Azure OpenAI即可进入OpenAI服务的页面(如果不显示的话,则在首页上方的搜索栏里搜索“openai”即可)。
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
不过现在这个服务还不能用。在使用前,还需要在这个网址申请一下:
|
||||||
|
|
||||||
|
[Request Access to Azure OpenAI Service (microsoft.com)](https://customervoice.microsoft.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR7en2Ais5pxKtso_Pz4b1_xUOFA5Qk1UWDRBMjg0WFhPMkIzTzhKQ1dWNyQlQCN0PWcu)
|
||||||
|
|
||||||
|
这里有二十来个问题,按照要求和自己的实际情况填写即可。
|
||||||
|
|
||||||
|
其中需要注意的是
|
||||||
|
|
||||||
|
1. 千万记得填对"订阅ID"
|
||||||
|
|
||||||
|
2. 需要填一个公司邮箱(可以不是注册用的邮箱)和公司网址
|
||||||
|
|
||||||
|
之后,在回到上面那个页面,点创建,就会进入创建页面了:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
需要填入“资源组”和“名称”,按照自己的需要填入即可。
|
||||||
|
|
||||||
|
完成后,在主页的“资源”里就可以看到刚才创建的“资源”了,点击进入后,就可以进行最后的部署了。
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
## 部署模型
|
||||||
|
|
||||||
|
进入资源页面后,在部署模型前,可以先点击“开发”,把密钥和终结点记下来。
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
之后,就可以去部署模型了,点击“部署”即可,会跳转到 Azure OpenAI Stuido 进行下面的操作:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
进入 Azure OpenAi Studio 后,点击新建部署,会弹出如下对话框:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
在这里选 gpt-35-turbo 或需要的模型并按需要填入“部署名”即可完成模型的部署。
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
这个部署名需要记下来。
|
||||||
|
|
||||||
|
到现在为止,申请操作就完成了,需要记下来的有下面几个东西:
|
||||||
|
|
||||||
|
● 密钥(1或2都可以)
|
||||||
|
|
||||||
|
● 终结点
|
||||||
|
|
||||||
|
● 部署名(不是模型名)
|
||||||
|
|
||||||
|
# 修改 config.py
|
||||||
|
|
||||||
|
```
|
||||||
|
AZURE_ENDPOINT = "填入终结点"
|
||||||
|
AZURE_API_KEY = "填入azure openai api的密钥"
|
||||||
|
AZURE_API_VERSION = "2023-05-15" # 默认使用 2023-05-15 版本,无需修改
|
||||||
|
AZURE_ENGINE = "填入部署名"
|
||||||
|
|
||||||
|
```
|
||||||
|
# API的使用
|
||||||
|
|
||||||
|
接下来就是具体怎么使用API了,还是可以参考官方文档:[快速入门 - 开始通过 Azure OpenAI 服务使用 ChatGPT 和 GPT-4 - Azure OpenAI Service | Microsoft Learn](https://learn.microsoft.com/zh-cn/azure/cognitive-services/openai/chatgpt-quickstart?pivots=programming-language-python)
|
||||||
|
|
||||||
|
和openai自己的api调用有点类似,都需要安装openai库,不同的是调用方式
|
||||||
|
|
||||||
|
```
|
||||||
|
import openai
|
||||||
|
openai.api_type = "azure" #固定格式,无需修改
|
||||||
|
openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT") #这里填入“终结点”
|
||||||
|
openai.api_version = "2023-05-15" #固定格式,无需修改
|
||||||
|
openai.api_key = os.getenv("AZURE_OPENAI_KEY") #这里填入“密钥1”或“密钥2”
|
||||||
|
|
||||||
|
response = openai.ChatCompletion.create(
|
||||||
|
engine="gpt-35-turbo", #这里填入的不是模型名,是部署名
|
||||||
|
messages=[
|
||||||
|
{"role": "system", "content": "You are a helpful assistant."},
|
||||||
|
{"role": "user", "content": "Does Azure OpenAI support customer managed keys?"},
|
||||||
|
{"role": "assistant", "content": "Yes, customer managed keys are supported by Azure OpenAI."},
|
||||||
|
{"role": "user", "content": "Do other Azure Cognitive Services support this too?"}
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
print(response)
|
||||||
|
print(response['choices'][0]['message']['content'])
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
需要注意的是:
|
||||||
|
|
||||||
|
1. engine那里填入的是部署名,不是模型名
|
||||||
|
|
||||||
|
2. 通过openai库获得的这个 response 和通过 request 库访问 url 获得的 response 不同,不需要 decode,已经是解析好的 json 了,直接根据键值读取即可。
|
||||||
|
|
||||||
|
更细节的使用方法,详见官方API文档。
|
||||||
|
|
||||||
|
# 关于费用
|
||||||
|
|
||||||
|
Azure OpenAI API 还是需要一些费用的(免费订阅只有1个月有效期),费用如下:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
具体可以可以看这个网址 :[Azure OpenAI 服务 - 定价| Microsoft Azure](https://azure.microsoft.com/zh-cn/pricing/details/cognitive-services/openai-service/?cdn=disable)
|
||||||
|
|
||||||
|
并非网上说的什么“一年白嫖”,但注册方法以及网络问题都比直接使用openai的api要简单一些。
|
||||||
@@ -16,6 +16,9 @@ from toolbox import get_conf, trimmed_format_exc
|
|||||||
from .bridge_chatgpt import predict_no_ui_long_connection as chatgpt_noui
|
from .bridge_chatgpt import predict_no_ui_long_connection as chatgpt_noui
|
||||||
from .bridge_chatgpt import predict as chatgpt_ui
|
from .bridge_chatgpt import predict as chatgpt_ui
|
||||||
|
|
||||||
|
from .bridge_azure_test import predict_no_ui_long_connection as azure_noui
|
||||||
|
from .bridge_azure_test import predict as azure_ui
|
||||||
|
|
||||||
from .bridge_chatglm import predict_no_ui_long_connection as chatglm_noui
|
from .bridge_chatglm import predict_no_ui_long_connection as chatglm_noui
|
||||||
from .bridge_chatglm import predict as chatglm_ui
|
from .bridge_chatglm import predict as chatglm_ui
|
||||||
|
|
||||||
@@ -93,6 +96,24 @@ model_info = {
|
|||||||
"token_cnt": get_token_num_gpt35,
|
"token_cnt": get_token_num_gpt35,
|
||||||
},
|
},
|
||||||
|
|
||||||
|
"gpt-3.5-turbo-0613": {
|
||||||
|
"fn_with_ui": chatgpt_ui,
|
||||||
|
"fn_without_ui": chatgpt_noui,
|
||||||
|
"endpoint": openai_endpoint,
|
||||||
|
"max_token": 4096,
|
||||||
|
"tokenizer": tokenizer_gpt35,
|
||||||
|
"token_cnt": get_token_num_gpt35,
|
||||||
|
},
|
||||||
|
|
||||||
|
"gpt-3.5-turbo-16k-0613": {
|
||||||
|
"fn_with_ui": chatgpt_ui,
|
||||||
|
"fn_without_ui": chatgpt_noui,
|
||||||
|
"endpoint": openai_endpoint,
|
||||||
|
"max_token": 1024 * 16,
|
||||||
|
"tokenizer": tokenizer_gpt35,
|
||||||
|
"token_cnt": get_token_num_gpt35,
|
||||||
|
},
|
||||||
|
|
||||||
"gpt-4": {
|
"gpt-4": {
|
||||||
"fn_with_ui": chatgpt_ui,
|
"fn_with_ui": chatgpt_ui,
|
||||||
"fn_without_ui": chatgpt_noui,
|
"fn_without_ui": chatgpt_noui,
|
||||||
@@ -102,6 +123,16 @@ model_info = {
|
|||||||
"token_cnt": get_token_num_gpt4,
|
"token_cnt": get_token_num_gpt4,
|
||||||
},
|
},
|
||||||
|
|
||||||
|
# azure openai
|
||||||
|
"azure-gpt35":{
|
||||||
|
"fn_with_ui": azure_ui,
|
||||||
|
"fn_without_ui": azure_noui,
|
||||||
|
"endpoint": get_conf("AZURE_ENDPOINT"),
|
||||||
|
"max_token": 4096,
|
||||||
|
"tokenizer": tokenizer_gpt35,
|
||||||
|
"token_cnt": get_token_num_gpt35,
|
||||||
|
},
|
||||||
|
|
||||||
# api_2d
|
# api_2d
|
||||||
"api2d-gpt-3.5-turbo": {
|
"api2d-gpt-3.5-turbo": {
|
||||||
"fn_with_ui": chatgpt_ui,
|
"fn_with_ui": chatgpt_ui,
|
||||||
|
|||||||
241
request_llm/bridge_azure_test.py
普通文件
241
request_llm/bridge_azure_test.py
普通文件
@@ -0,0 +1,241 @@
|
|||||||
|
"""
|
||||||
|
该文件中主要包含三个函数
|
||||||
|
|
||||||
|
不具备多线程能力的函数:
|
||||||
|
1. predict: 正常对话时使用,具备完备的交互功能,不可多线程
|
||||||
|
|
||||||
|
具备多线程调用能力的函数
|
||||||
|
2. predict_no_ui:高级实验性功能模块调用,不会实时显示在界面上,参数简单,可以多线程并行,方便实现复杂的功能逻辑
|
||||||
|
3. predict_no_ui_long_connection:在实验过程中发现调用predict_no_ui处理长文档时,和openai的连接容易断掉,这个函数用stream的方式解决这个问题,同样支持多线程
|
||||||
|
"""
|
||||||
|
|
||||||
|
import logging
|
||||||
|
import traceback
|
||||||
|
import importlib
|
||||||
|
import openai
|
||||||
|
import time
|
||||||
|
|
||||||
|
|
||||||
|
# 读取config.py文件中关于AZURE OPENAI API的信息
|
||||||
|
from toolbox import get_conf, update_ui, clip_history, trimmed_format_exc
|
||||||
|
TIMEOUT_SECONDS, MAX_RETRY, AZURE_ENGINE, AZURE_ENDPOINT, AZURE_API_VERSION, AZURE_API_KEY = \
|
||||||
|
get_conf('TIMEOUT_SECONDS', 'MAX_RETRY',"AZURE_ENGINE","AZURE_ENDPOINT", "AZURE_API_VERSION", "AZURE_API_KEY")
|
||||||
|
|
||||||
|
|
||||||
|
def get_full_error(chunk, stream_response):
|
||||||
|
"""
|
||||||
|
获取完整的从Openai返回的报错
|
||||||
|
"""
|
||||||
|
while True:
|
||||||
|
try:
|
||||||
|
chunk += next(stream_response)
|
||||||
|
except:
|
||||||
|
break
|
||||||
|
return chunk
|
||||||
|
|
||||||
|
def predict(inputs, llm_kwargs, plugin_kwargs, chatbot, history=[], system_prompt='', stream = True, additional_fn=None):
|
||||||
|
"""
|
||||||
|
发送至azure openai api,流式获取输出。
|
||||||
|
用于基础的对话功能。
|
||||||
|
inputs 是本次问询的输入
|
||||||
|
top_p, temperature是chatGPT的内部调优参数
|
||||||
|
history 是之前的对话列表(注意无论是inputs还是history,内容太长了都会触发token数量溢出的错误)
|
||||||
|
chatbot 为WebUI中显示的对话列表,修改它,然后yeild出去,可以直接修改对话界面内容
|
||||||
|
additional_fn代表点击的哪个按钮,按钮见functional.py
|
||||||
|
"""
|
||||||
|
print(llm_kwargs["llm_model"])
|
||||||
|
|
||||||
|
if additional_fn is not None:
|
||||||
|
import core_functional
|
||||||
|
importlib.reload(core_functional) # 热更新prompt
|
||||||
|
core_functional = core_functional.get_core_functions()
|
||||||
|
if "PreProcess" in core_functional[additional_fn]: inputs = core_functional[additional_fn]["PreProcess"](inputs) # 获取预处理函数(如果有的话)
|
||||||
|
inputs = core_functional[additional_fn]["Prefix"] + inputs + core_functional[additional_fn]["Suffix"]
|
||||||
|
|
||||||
|
raw_input = inputs
|
||||||
|
logging.info(f'[raw_input] {raw_input}')
|
||||||
|
chatbot.append((inputs, ""))
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history, msg="等待响应") # 刷新界面
|
||||||
|
|
||||||
|
|
||||||
|
payload = generate_azure_payload(inputs, llm_kwargs, history, system_prompt, stream)
|
||||||
|
|
||||||
|
history.append(inputs); history.append("")
|
||||||
|
|
||||||
|
retry = 0
|
||||||
|
while True:
|
||||||
|
try:
|
||||||
|
|
||||||
|
openai.api_type = "azure"
|
||||||
|
openai.api_version = AZURE_API_VERSION
|
||||||
|
openai.api_base = AZURE_ENDPOINT
|
||||||
|
openai.api_key = AZURE_API_KEY
|
||||||
|
response = openai.ChatCompletion.create(timeout=TIMEOUT_SECONDS, **payload);break
|
||||||
|
|
||||||
|
except:
|
||||||
|
retry += 1
|
||||||
|
chatbot[-1] = ((chatbot[-1][0], "获取response失败,重试中。。。"))
|
||||||
|
retry_msg = f",正在重试 ({retry}/{MAX_RETRY}) ……" if MAX_RETRY > 0 else ""
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history, msg="请求超时"+retry_msg) # 刷新界面
|
||||||
|
if retry > MAX_RETRY: raise TimeoutError
|
||||||
|
|
||||||
|
gpt_replying_buffer = ""
|
||||||
|
is_head_of_the_stream = True
|
||||||
|
if stream:
|
||||||
|
|
||||||
|
stream_response = response
|
||||||
|
|
||||||
|
while True:
|
||||||
|
try:
|
||||||
|
chunk = next(stream_response)
|
||||||
|
|
||||||
|
except StopIteration:
|
||||||
|
from toolbox import regular_txt_to_markdown; tb_str = '```\n' + trimmed_format_exc() + '```'
|
||||||
|
chatbot[-1] = (chatbot[-1][0], f"[Local Message] 远程返回错误: \n\n{tb_str} \n\n{regular_txt_to_markdown(chunk)}")
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history, msg="远程返回错误:" + chunk) # 刷新界面
|
||||||
|
return
|
||||||
|
|
||||||
|
if is_head_of_the_stream and (r'"object":"error"' not in chunk):
|
||||||
|
# 数据流的第一帧不携带content
|
||||||
|
is_head_of_the_stream = False; continue
|
||||||
|
|
||||||
|
if chunk:
|
||||||
|
#print(chunk)
|
||||||
|
try:
|
||||||
|
if "delta" in chunk["choices"][0]:
|
||||||
|
if chunk["choices"][0]["finish_reason"] == "stop":
|
||||||
|
logging.info(f'[response] {gpt_replying_buffer}')
|
||||||
|
break
|
||||||
|
status_text = f"finish_reason: {chunk['choices'][0]['finish_reason']}"
|
||||||
|
gpt_replying_buffer = gpt_replying_buffer + chunk["choices"][0]["delta"]["content"]
|
||||||
|
|
||||||
|
history[-1] = gpt_replying_buffer
|
||||||
|
chatbot[-1] = (history[-2], history[-1])
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history, msg=status_text) # 刷新界面
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
traceback.print_exc()
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history, msg="Json解析不合常规") # 刷新界面
|
||||||
|
chunk = get_full_error(chunk, stream_response)
|
||||||
|
|
||||||
|
error_msg = chunk
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history, msg="Json异常" + error_msg) # 刷新界面
|
||||||
|
return
|
||||||
|
|
||||||
|
|
||||||
|
def predict_no_ui_long_connection(inputs, llm_kwargs, history=[], sys_prompt="", observe_window=None, console_slience=False):
|
||||||
|
"""
|
||||||
|
发送至AZURE OPENAI API,等待回复,一次性完成,不显示中间过程。但内部用stream的方法避免中途网线被掐。
|
||||||
|
inputs:
|
||||||
|
是本次问询的输入
|
||||||
|
sys_prompt:
|
||||||
|
系统静默prompt
|
||||||
|
llm_kwargs:
|
||||||
|
chatGPT的内部调优参数
|
||||||
|
history:
|
||||||
|
是之前的对话列表
|
||||||
|
observe_window = None:
|
||||||
|
用于负责跨越线程传递已经输出的部分,大部分时候仅仅为了fancy的视觉效果,留空即可。observe_window[0]:观测窗。observe_window[1]:看门狗
|
||||||
|
"""
|
||||||
|
watch_dog_patience = 5 # 看门狗的耐心, 设置5秒即可
|
||||||
|
payload = generate_azure_payload(inputs, llm_kwargs, history, system_prompt=sys_prompt, stream=True)
|
||||||
|
retry = 0
|
||||||
|
while True:
|
||||||
|
|
||||||
|
try:
|
||||||
|
openai.api_type = "azure"
|
||||||
|
openai.api_version = AZURE_API_VERSION
|
||||||
|
openai.api_base = AZURE_ENDPOINT
|
||||||
|
openai.api_key = AZURE_API_KEY
|
||||||
|
response = openai.ChatCompletion.create(timeout=TIMEOUT_SECONDS, **payload);break
|
||||||
|
|
||||||
|
except:
|
||||||
|
retry += 1
|
||||||
|
traceback.print_exc()
|
||||||
|
if retry > MAX_RETRY: raise TimeoutError
|
||||||
|
if MAX_RETRY!=0: print(f'请求超时,正在重试 ({retry}/{MAX_RETRY}) ……')
|
||||||
|
|
||||||
|
|
||||||
|
stream_response = response
|
||||||
|
result = ''
|
||||||
|
while True:
|
||||||
|
try: chunk = next(stream_response)
|
||||||
|
except StopIteration:
|
||||||
|
break
|
||||||
|
except:
|
||||||
|
chunk = next(stream_response) # 失败了,重试一次?再失败就没办法了。
|
||||||
|
|
||||||
|
if len(chunk)==0: continue
|
||||||
|
if not chunk.startswith('data:'):
|
||||||
|
error_msg = get_full_error(chunk, stream_response)
|
||||||
|
if "reduce the length" in error_msg:
|
||||||
|
raise ConnectionAbortedError("AZURE OPENAI API拒绝了请求:" + error_msg)
|
||||||
|
else:
|
||||||
|
raise RuntimeError("AZURE OPENAI API拒绝了请求:" + error_msg)
|
||||||
|
if ('data: [DONE]' in chunk): break
|
||||||
|
|
||||||
|
delta = chunk["delta"]
|
||||||
|
if len(delta) == 0: break
|
||||||
|
if "role" in delta: continue
|
||||||
|
if "content" in delta:
|
||||||
|
result += delta["content"]
|
||||||
|
if not console_slience: print(delta["content"], end='')
|
||||||
|
if observe_window is not None:
|
||||||
|
# 观测窗,把已经获取的数据显示出去
|
||||||
|
if len(observe_window) >= 1: observe_window[0] += delta["content"]
|
||||||
|
# 看门狗,如果超过期限没有喂狗,则终止
|
||||||
|
if len(observe_window) >= 2:
|
||||||
|
if (time.time()-observe_window[1]) > watch_dog_patience:
|
||||||
|
raise RuntimeError("用户取消了程序。")
|
||||||
|
else: raise RuntimeError("意外Json结构:"+delta)
|
||||||
|
if chunk['finish_reason'] == 'length':
|
||||||
|
raise ConnectionAbortedError("正常结束,但显示Token不足,导致输出不完整,请削减单次输入的文本量。")
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
def generate_azure_payload(inputs, llm_kwargs, history, system_prompt, stream):
|
||||||
|
"""
|
||||||
|
整合所有信息,选择LLM模型,生成 azure openai api请求,为发送请求做准备
|
||||||
|
"""
|
||||||
|
|
||||||
|
conversation_cnt = len(history) // 2
|
||||||
|
|
||||||
|
messages = [{"role": "system", "content": system_prompt}]
|
||||||
|
if conversation_cnt:
|
||||||
|
for index in range(0, 2*conversation_cnt, 2):
|
||||||
|
what_i_have_asked = {}
|
||||||
|
what_i_have_asked["role"] = "user"
|
||||||
|
what_i_have_asked["content"] = history[index]
|
||||||
|
what_gpt_answer = {}
|
||||||
|
what_gpt_answer["role"] = "assistant"
|
||||||
|
what_gpt_answer["content"] = history[index+1]
|
||||||
|
if what_i_have_asked["content"] != "":
|
||||||
|
if what_gpt_answer["content"] == "": continue
|
||||||
|
messages.append(what_i_have_asked)
|
||||||
|
messages.append(what_gpt_answer)
|
||||||
|
else:
|
||||||
|
messages[-1]['content'] = what_gpt_answer['content']
|
||||||
|
|
||||||
|
what_i_ask_now = {}
|
||||||
|
what_i_ask_now["role"] = "user"
|
||||||
|
what_i_ask_now["content"] = inputs
|
||||||
|
messages.append(what_i_ask_now)
|
||||||
|
|
||||||
|
payload = {
|
||||||
|
"model": llm_kwargs['llm_model'],
|
||||||
|
"messages": messages,
|
||||||
|
"temperature": llm_kwargs['temperature'], # 1.0,
|
||||||
|
"top_p": llm_kwargs['top_p'], # 1.0,
|
||||||
|
"n": 1,
|
||||||
|
"stream": stream,
|
||||||
|
"presence_penalty": 0,
|
||||||
|
"frequency_penalty": 0,
|
||||||
|
"engine": AZURE_ENGINE
|
||||||
|
}
|
||||||
|
try:
|
||||||
|
print(f" {llm_kwargs['llm_model']} : {conversation_cnt} : {inputs[:100]} ..........")
|
||||||
|
except:
|
||||||
|
print('输入中可能存在乱码。')
|
||||||
|
return payload
|
||||||
|
|
||||||
|
|
||||||
在新工单中引用
屏蔽一个用户