镜像自地址
https://github.com/binary-husky/gpt_academic.git
已同步 2025-12-06 14:36:48 +00:00
比较提交
31 次代码提交
version3.4
...
version3.4
| 作者 | SHA1 | 提交日期 | |
|---|---|---|---|
|
|
49253c4dc6 | ||
|
|
1a00093015 | ||
|
|
64f76e7401 | ||
|
|
eb4c07997e | ||
|
|
d684b4cdb3 | ||
|
|
601a95c948 | ||
|
|
e18bef2e9c | ||
|
|
f654c1af31 | ||
|
|
e90048a671 | ||
|
|
ea624b1510 | ||
|
|
057e3dda3c | ||
|
|
4290821a50 | ||
|
|
280e14d7b7 | ||
|
|
9f0cf9fb2b | ||
|
|
b8560b7510 | ||
|
|
d841d13b04 | ||
|
|
efda9e5193 | ||
|
|
33d2e75aac | ||
|
|
74941170aa | ||
|
|
cd38949903 | ||
|
|
d87f1eb171 | ||
|
|
cd1e4e1ba7 | ||
|
|
cf5f348d70 | ||
|
|
f3e4e26e2f | ||
|
|
d5bab093f9 | ||
|
|
f94b167dc2 | ||
|
|
016d8ee156 | ||
|
|
dca9ec4bae | ||
|
|
7fdf0a8e51 | ||
|
|
9a5a509dd9 | ||
|
|
f3205994ea |
30
README.md
30
README.md
@@ -97,7 +97,7 @@ cd gpt_academic
|
||||
|
||||
2. 配置API_KEY
|
||||
|
||||
在`config.py`中,配置API KEY等设置,[特殊网络环境设置](https://github.com/binary-husky/gpt_academic/issues/1) 。
|
||||
在`config.py`中,配置API KEY等设置,[点击查看特殊网络环境设置方法](https://github.com/binary-husky/gpt_academic/issues/1) 。
|
||||
|
||||
(P.S. 程序运行时会优先检查是否存在名为`config_private.py`的私密配置文件,并用其中的配置覆盖`config.py`的同名配置。因此,如果您能理解我们的配置读取逻辑,我们强烈建议您在`config.py`旁边创建一个名为`config_private.py`的新配置文件,并把`config.py`中的配置转移(复制)到`config_private.py`中。`config_private.py`不受git管控,可以让您的隐私信息更加安全。P.S.项目同样支持通过`环境变量`配置大多数选项,环境变量的书写格式参考`docker-compose`文件。读取优先级: `环境变量` > `config_private.py` > `config.py`)
|
||||
|
||||
@@ -140,15 +140,9 @@ AVAIL_LLM_MODELS = ["gpt-3.5-turbo", "api2d-gpt-3.5-turbo", "gpt-4", "api2d-gpt-
|
||||
python main.py
|
||||
```
|
||||
|
||||
5. 测试函数插件
|
||||
```
|
||||
- 测试函数插件模板函数(要求gpt回答历史上的今天发生了什么),您可以根据此函数为模板,实现更复杂的功能
|
||||
点击 "[函数插件模板Demo] 历史上的今天"
|
||||
```
|
||||
|
||||
## 安装-方法2:使用Docker
|
||||
|
||||
1. 仅ChatGPT(推荐大多数人选择)
|
||||
1. 仅ChatGPT(推荐大多数人选择,等价于docker-compose方案1)
|
||||
|
||||
``` sh
|
||||
git clone https://github.com/binary-husky/gpt_academic.git # 下载项目
|
||||
@@ -161,41 +155,43 @@ docker run --rm -it --net=host gpt-academic
|
||||
#(最后一步-选择2)在macOS/windows环境下,只能用-p选项将容器上的端口(例如50923)暴露给主机上的端口
|
||||
docker run --rm -it -e WEB_PORT=50923 -p 50923:50923 gpt-academic
|
||||
```
|
||||
P.S. 如果需要依赖Latex的插件功能,请见Wiki
|
||||
P.S. 如果需要依赖Latex的插件功能,请见Wiki。另外,您也可以直接使用docker-compose获取Latex功能(修改docker-compose.yml,保留方案4并删除其他方案)。
|
||||
|
||||
2. ChatGPT + ChatGLM + MOSS(需要熟悉Docker)
|
||||
|
||||
``` sh
|
||||
# 修改docker-compose.yml,删除方案1和方案3,保留方案2。修改docker-compose.yml中方案2的配置,参考其中注释即可
|
||||
# 修改docker-compose.yml,保留方案2并删除其他方案。修改docker-compose.yml中方案2的配置,参考其中注释即可
|
||||
docker-compose up
|
||||
```
|
||||
|
||||
3. ChatGPT + LLAMA + 盘古 + RWKV(需要熟悉Docker)
|
||||
``` sh
|
||||
# 修改docker-compose.yml,删除方案1和方案2,保留方案3。修改docker-compose.yml中方案3的配置,参考其中注释即可
|
||||
# 修改docker-compose.yml,保留方案3并删除其他方案。修改docker-compose.yml中方案3的配置,参考其中注释即可
|
||||
docker-compose up
|
||||
```
|
||||
|
||||
|
||||
## 安装-方法3:其他部署姿势
|
||||
1. 一键运行脚本。
|
||||
完全不熟悉python环境的Windows用户可以下载[Release](https://github.com/binary-husky/gpt_academic/releases)中发布的一键运行脚本安装无本地模型的版本,
|
||||
不建议电脑上已有python的用户采用此方法(在此基础上安装插件的依赖很麻烦)。
|
||||
完全不熟悉python环境的Windows用户可以下载[Release](https://github.com/binary-husky/gpt_academic/releases)中发布的一键运行脚本安装无本地模型的版本。
|
||||
脚本的贡献来源是[oobabooga](https://github.com/oobabooga/one-click-installers)。
|
||||
|
||||
2. 使用docker-compose运行。
|
||||
请阅读docker-compose.yml后,按照其中的提示操作即可
|
||||
|
||||
3. 如何使用反代URL/微软云AzureAPI。
|
||||
3. 如何使用反代URL
|
||||
按照`config.py`中的说明配置API_URL_REDIRECT即可。
|
||||
|
||||
4. 远程云服务器部署(需要云服务器知识与经验)。
|
||||
4. 微软云AzureAPI
|
||||
按照`config.py`中的说明配置即可(AZURE_ENDPOINT等四个配置)
|
||||
|
||||
5. 远程云服务器部署(需要云服务器知识与经验)。
|
||||
请访问[部署wiki-1](https://github.com/binary-husky/gpt_academic/wiki/%E4%BA%91%E6%9C%8D%E5%8A%A1%E5%99%A8%E8%BF%9C%E7%A8%8B%E9%83%A8%E7%BD%B2%E6%8C%87%E5%8D%97)
|
||||
|
||||
5. 使用WSL2(Windows Subsystem for Linux 子系统)。
|
||||
6. 使用WSL2(Windows Subsystem for Linux 子系统)。
|
||||
请访问[部署wiki-2](https://github.com/binary-husky/gpt_academic/wiki/%E4%BD%BF%E7%94%A8WSL2%EF%BC%88Windows-Subsystem-for-Linux-%E5%AD%90%E7%B3%BB%E7%BB%9F%EF%BC%89%E9%83%A8%E7%BD%B2)
|
||||
|
||||
6. 如何在二级网址(如`http://localhost/subpath`)下运行。
|
||||
7. 如何在二级网址(如`http://localhost/subpath`)下运行。
|
||||
请访问[FastAPI运行说明](docs/WithFastapi.md)
|
||||
|
||||
---
|
||||
|
||||
12
config.py
12
config.py
@@ -1,6 +1,7 @@
|
||||
# [step 1]>> 例如: API_KEY = "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" (此key无效)
|
||||
API_KEY = "sk-此处填API密钥" # 可同时填写多个API-KEY,用英文逗号分割,例如API_KEY = "sk-openaikey1,sk-openaikey2,fkxxxx-api2dkey1,fkxxxx-api2dkey2"
|
||||
|
||||
|
||||
# [step 2]>> 改为True应用代理,如果直接在海外服务器部署,此处不修改
|
||||
USE_PROXY = False
|
||||
if USE_PROXY:
|
||||
@@ -46,8 +47,8 @@ MAX_RETRY = 2
|
||||
|
||||
# 模型选择是 (注意: LLM_MODEL是默认选中的模型, 同时它必须被包含在AVAIL_LLM_MODELS切换列表中 )
|
||||
LLM_MODEL = "gpt-3.5-turbo" # 可选 ↓↓↓
|
||||
AVAIL_LLM_MODELS = ["gpt-3.5-turbo-16k", "gpt-3.5-turbo", "api2d-gpt-3.5-turbo", "gpt-4", "api2d-gpt-4", "chatglm", "moss", "newbing", "newbing-free", "stack-claude"]
|
||||
# P.S. 其他可用的模型还包括 ["newbing-free", "jittorllms_rwkv", "jittorllms_pangualpha", "jittorllms_llama"]
|
||||
AVAIL_LLM_MODELS = ["gpt-3.5-turbo-16k", "gpt-3.5-turbo", "azure-gpt35", "api2d-gpt-3.5-turbo", "gpt-4", "api2d-gpt-4", "chatglm", "moss", "newbing", "newbing-free", "stack-claude"]
|
||||
# P.S. 其他可用的模型还包括 ["gpt-3.5-turbo-0613", "gpt-3.5-turbo-16k-0613", "newbing-free", "jittorllms_rwkv", "jittorllms_pangualpha", "jittorllms_llama"]
|
||||
|
||||
# 本地LLM模型如ChatGLM的执行方式 CPU/GPU
|
||||
LOCAL_MODEL_DEVICE = "cpu" # 可选 "cuda"
|
||||
@@ -81,3 +82,10 @@ your bing cookies here
|
||||
# 如果需要使用Slack Claude,使用教程详情见 request_llm/README.md
|
||||
SLACK_CLAUDE_BOT_ID = ''
|
||||
SLACK_CLAUDE_USER_TOKEN = ''
|
||||
|
||||
|
||||
# 如果需要使用AZURE 详情请见额外文档 docs\use_azure.md
|
||||
AZURE_ENDPOINT = "https://你的api名称.openai.azure.com/"
|
||||
AZURE_API_KEY = "填入azure openai api的密钥"
|
||||
AZURE_API_VERSION = "填入api版本"
|
||||
AZURE_ENGINE = "填入ENGINE"
|
||||
|
||||
@@ -348,17 +348,28 @@ def get_crazy_functions():
|
||||
try:
|
||||
from crazy_functions.Latex输出PDF结果 import Latex英文纠错加PDF对比
|
||||
function_plugins.update({
|
||||
"[功能尚不稳定] Latex英文纠错+LatexDiff高亮修正位置": {
|
||||
"Latex英文纠错+高亮修正位置 [需Latex]": {
|
||||
"Color": "stop",
|
||||
"AsButton": False,
|
||||
# "AdvancedArgs": True,
|
||||
# "ArgsReminder": "",
|
||||
"AdvancedArgs": True,
|
||||
"ArgsReminder": "如果有必要, 请在此处追加更细致的矫错指令(使用英文)。",
|
||||
"Function": HotReload(Latex英文纠错加PDF对比)
|
||||
}
|
||||
})
|
||||
from crazy_functions.Latex输出PDF结果 import Latex翻译中文并重新编译PDF
|
||||
function_plugins.update({
|
||||
"Arixv翻译(输入arxivID) [需Latex]": {
|
||||
"Arixv翻译(输入arxivID)[需Latex]": {
|
||||
"Color": "stop",
|
||||
"AsButton": False,
|
||||
"AdvancedArgs": True,
|
||||
"ArgsReminder":
|
||||
"如果有必要, 请在此处给出自定义翻译命令, 解决部分词汇翻译不准确的问题。 "+
|
||||
"例如当单词'agent'翻译不准确时, 请尝试把以下指令复制到高级参数区: " + 'If the term "agent" is used in this section, it should be translated to "智能体". ',
|
||||
"Function": HotReload(Latex翻译中文并重新编译PDF)
|
||||
}
|
||||
})
|
||||
function_plugins.update({
|
||||
"本地论文翻译(上传Latex压缩包)[需Latex]": {
|
||||
"Color": "stop",
|
||||
"AsButton": False,
|
||||
"AdvancedArgs": True,
|
||||
@@ -368,17 +379,6 @@ def get_crazy_functions():
|
||||
"Function": HotReload(Latex翻译中文并重新编译PDF)
|
||||
}
|
||||
})
|
||||
# function_plugins.update({
|
||||
# "本地论文翻译(上传Latex压缩包) [需Latex]": {
|
||||
# "Color": "stop",
|
||||
# "AsButton": False,
|
||||
# "AdvancedArgs": True,
|
||||
# "ArgsReminder":
|
||||
# "如果有必要, 请在此处给出自定义翻译命令, 解决部分词汇翻译不准确的问题。 "+
|
||||
# "例如当单词'agent'翻译不准确时, 请尝试把以下指令复制到高级参数区: " + 'If the term "agent" is used in this section, it should be translated to "智能体". ',
|
||||
# "Function": HotReload(Latex翻译中文并重新编译PDF)
|
||||
# }
|
||||
# })
|
||||
except:
|
||||
print('Load function plugin failed')
|
||||
|
||||
|
||||
@@ -19,9 +19,9 @@ def switch_prompt(pfg, mode, more_requirement):
|
||||
- sys_prompt_array: A list of strings containing prompts for system prompts.
|
||||
"""
|
||||
n_split = len(pfg.sp_file_contents)
|
||||
if mode == 'proofread':
|
||||
if mode == 'proofread_en':
|
||||
inputs_array = [r"Below is a section from an academic paper, proofread this section." +
|
||||
r"Do not modify any latex command such as \section, \cite, \begin, \item and equations. " +
|
||||
r"Do not modify any latex command such as \section, \cite, \begin, \item and equations. " + more_requirement +
|
||||
r"Answer me only with the revised text:" +
|
||||
f"\n\n{frag}" for frag in pfg.sp_file_contents]
|
||||
sys_prompt_array = ["You are a professional academic paper writer." for _ in range(n_split)]
|
||||
@@ -70,6 +70,12 @@ def move_project(project_folder, arxiv_id=None):
|
||||
shutil.rmtree(new_workfolder)
|
||||
except:
|
||||
pass
|
||||
|
||||
# align subfolder if there is a folder wrapper
|
||||
items = glob.glob(pj(project_folder,'*'))
|
||||
if len(glob.glob(pj(project_folder,'*.tex'))) == 0 and len(items) == 1:
|
||||
if os.path.isdir(items[0]): project_folder = items[0]
|
||||
|
||||
shutil.copytree(src=project_folder, dst=new_workfolder)
|
||||
return new_workfolder
|
||||
|
||||
@@ -141,7 +147,11 @@ def Latex英文纠错加PDF对比(txt, llm_kwargs, plugin_kwargs, chatbot, histo
|
||||
chatbot.append([ "函数插件功能?",
|
||||
"对整个Latex项目进行纠错, 用latex编译为PDF对修正处做高亮。函数插件贡献者: Binary-Husky。注意事项: 目前仅支持GPT3.5/GPT4,其他模型转化效果未知。目前对机器学习类文献转化效果最好,其他类型文献转化效果未知。仅在Windows系统进行了测试,其他操作系统表现未知。"])
|
||||
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
||||
|
||||
|
||||
# <-------------- more requirements ------------->
|
||||
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
|
||||
more_req = plugin_kwargs.get("advanced_arg", "")
|
||||
_switch_prompt_ = partial(switch_prompt, more_requirement=more_req)
|
||||
|
||||
# <-------------- check deps ------------->
|
||||
try:
|
||||
@@ -180,13 +190,13 @@ def Latex英文纠错加PDF对比(txt, llm_kwargs, plugin_kwargs, chatbot, histo
|
||||
|
||||
|
||||
# <-------------- if merge_translate_zh is already generated, skip gpt req ------------->
|
||||
if not os.path.exists(project_folder + '/merge_proofread.tex'):
|
||||
if not os.path.exists(project_folder + '/merge_proofread_en.tex'):
|
||||
yield from Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin_kwargs,
|
||||
chatbot, history, system_prompt, mode='proofread_latex', switch_prompt=switch_prompt)
|
||||
chatbot, history, system_prompt, mode='proofread_en', switch_prompt=_switch_prompt_)
|
||||
|
||||
|
||||
# <-------------- compile PDF ------------->
|
||||
success = yield from 编译Latex(chatbot, history, main_file_original='merge', main_file_modified='merge_proofread',
|
||||
success = yield from 编译Latex(chatbot, history, main_file_original='merge', main_file_modified='merge_proofread_en',
|
||||
work_folder_original=project_folder, work_folder_modified=project_folder, work_folder=project_folder)
|
||||
|
||||
|
||||
@@ -195,6 +205,7 @@ def Latex英文纠错加PDF对比(txt, llm_kwargs, plugin_kwargs, chatbot, histo
|
||||
if success:
|
||||
chatbot.append((f"成功啦", '请查收结果(压缩包)...'))
|
||||
yield from update_ui(chatbot=chatbot, history=history); time.sleep(1) # 刷新界面
|
||||
promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
|
||||
else:
|
||||
chatbot.append((f"失败了", '虽然PDF生成失败了, 但请查收结果(压缩包), 内含已经翻译的Tex文档, 也是可读的, 您可以到Github Issue区, 用该压缩包+对话历史存档进行反馈 ...'))
|
||||
yield from update_ui(chatbot=chatbot, history=history); time.sleep(1) # 刷新界面
|
||||
@@ -278,6 +289,7 @@ def Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot,
|
||||
if success:
|
||||
chatbot.append((f"成功啦", '请查收结果(压缩包)...'))
|
||||
yield from update_ui(chatbot=chatbot, history=history); time.sleep(1) # 刷新界面
|
||||
promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
|
||||
else:
|
||||
chatbot.append((f"失败了", '虽然PDF生成失败了, 但请查收结果(压缩包), 内含已经翻译的Tex文档, 也是可读的, 您可以到Github Issue区, 用该压缩包+对话历史存档进行反馈 ...'))
|
||||
yield from update_ui(chatbot=chatbot, history=history); time.sleep(1) # 刷新界面
|
||||
|
||||
@@ -188,7 +188,15 @@ def test_Latex():
|
||||
# txt = r"https://arxiv.org/abs/2305.17608"
|
||||
# txt = r"https://arxiv.org/abs/2211.16068" # ACE
|
||||
# txt = r"C:\Users\x\arxiv_cache\2211.16068\workfolder" # ACE
|
||||
txt = r"https://arxiv.org/abs/2002.09253"
|
||||
# txt = r"https://arxiv.org/abs/2002.09253"
|
||||
# txt = r"https://arxiv.org/abs/2306.07831"
|
||||
# txt = r"https://arxiv.org/abs/2212.10156"
|
||||
# txt = r"https://arxiv.org/abs/2211.11559"
|
||||
# txt = r"https://arxiv.org/abs/2303.08774"
|
||||
txt = r"https://arxiv.org/abs/2303.12712"
|
||||
# txt = r"C:\Users\fuqingxu\arxiv_cache\2303.12712\workfolder"
|
||||
|
||||
|
||||
for cookies, cb, hist, msg in (Latex翻译中文并重新编译PDF)(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
|
||||
cli_printer.print(cb) # print(cb)
|
||||
|
||||
@@ -217,6 +225,7 @@ def test_Latex():
|
||||
# test_数学动画生成manim()
|
||||
# test_Langchain知识库()
|
||||
# test_Langchain知识库读取()
|
||||
test_Latex()
|
||||
input("程序完成,回车退出。")
|
||||
print("退出。")
|
||||
if __name__ == "__main__":
|
||||
test_Latex()
|
||||
input("程序完成,回车退出。")
|
||||
print("退出。")
|
||||
@@ -8,24 +8,31 @@ pj = os.path.join
|
||||
"""
|
||||
========================================================================
|
||||
Part One
|
||||
Latex segmentation to a linklist
|
||||
Latex segmentation with a binary mask (PRESERVE=0, TRANSFORM=1)
|
||||
========================================================================
|
||||
"""
|
||||
PRESERVE = 0
|
||||
TRANSFORM = 1
|
||||
|
||||
def split_worker(text, mask, pattern, flags=0):
|
||||
def set_forbidden_text(text, mask, pattern, flags=0):
|
||||
"""
|
||||
Add a preserve text area in this paper
|
||||
e.g. with pattern = r"\\begin\{algorithm\}(.*?)\\end\{algorithm\}"
|
||||
you can mask out (mask = PRESERVE so that text become untouchable for GPT)
|
||||
everything between "\begin{equation}" and "\end{equation}"
|
||||
"""
|
||||
if isinstance(pattern, list): pattern = '|'.join(pattern)
|
||||
pattern_compile = re.compile(pattern, flags)
|
||||
for res in pattern_compile.finditer(text):
|
||||
mask[res.span()[0]:res.span()[1]] = PRESERVE
|
||||
return text, mask
|
||||
|
||||
def split_worker_careful_brace(text, mask, pattern, flags=0):
|
||||
def set_forbidden_text_careful_brace(text, mask, pattern, flags=0):
|
||||
"""
|
||||
Move area into preserve area
|
||||
Add a preserve text area in this paper (text become untouchable for GPT).
|
||||
count the number of the braces so as to catch compelete text area.
|
||||
e.g.
|
||||
\caption{blablablablabla\texbf{blablabla}blablabla.}
|
||||
"""
|
||||
pattern_compile = re.compile(pattern, flags)
|
||||
for res in pattern_compile.finditer(text):
|
||||
@@ -40,9 +47,12 @@ def split_worker_careful_brace(text, mask, pattern, flags=0):
|
||||
mask[begin:end] = PRESERVE
|
||||
return text, mask
|
||||
|
||||
def split_worker_reverse_careful_brace(text, mask, pattern, flags=0):
|
||||
def reverse_forbidden_text_careful_brace(text, mask, pattern, flags=0, forbid_wrapper=True):
|
||||
"""
|
||||
Move area out of preserve area
|
||||
Move area out of preserve area (make text editable for GPT)
|
||||
count the number of the braces so as to catch compelete text area.
|
||||
e.g.
|
||||
\caption{blablablablabla\texbf{blablabla}blablabla.}
|
||||
"""
|
||||
pattern_compile = re.compile(pattern, flags)
|
||||
for res in pattern_compile.finditer(text):
|
||||
@@ -55,9 +65,12 @@ def split_worker_reverse_careful_brace(text, mask, pattern, flags=0):
|
||||
p += 1
|
||||
end = p
|
||||
mask[begin:end] = TRANSFORM
|
||||
if forbid_wrapper:
|
||||
mask[res.regs[0][0]:begin] = PRESERVE
|
||||
mask[end:res.regs[0][1]] = PRESERVE
|
||||
return text, mask
|
||||
|
||||
def split_worker_begin_end(text, mask, pattern, flags=0, limit_n_lines=42):
|
||||
def set_forbidden_text_begin_end(text, mask, pattern, flags=0, limit_n_lines=42):
|
||||
"""
|
||||
Find all \begin{} ... \end{} text block that with less than limit_n_lines lines.
|
||||
Add it to preserve area
|
||||
@@ -110,19 +123,41 @@ Latex Merge File
|
||||
def 寻找Latex主文件(file_manifest, mode):
|
||||
"""
|
||||
在多Tex文档中,寻找主文件,必须包含documentclass,返回找到的第一个。
|
||||
P.S. 但愿没人把latex模板放在里面传进来
|
||||
P.S. 但愿没人把latex模板放在里面传进来 (6.25 加入判定latex模板的代码)
|
||||
"""
|
||||
canidates = []
|
||||
for texf in file_manifest:
|
||||
if os.path.basename(texf).startswith('merge'):
|
||||
continue
|
||||
with open(texf, 'r', encoding='utf8') as f:
|
||||
file_content = f.read()
|
||||
if r'\documentclass' in file_content:
|
||||
return texf
|
||||
canidates.append(texf)
|
||||
else:
|
||||
continue
|
||||
raise RuntimeError('无法找到一个主Tex文件(包含documentclass关键字)')
|
||||
|
||||
if len(canidates) == 0:
|
||||
raise RuntimeError('无法找到一个主Tex文件(包含documentclass关键字)')
|
||||
elif len(canidates) == 1:
|
||||
return canidates[0]
|
||||
else: # if len(canidates) >= 2 通过一些Latex模板中常见(但通常不会出现在正文)的单词,对不同latex源文件扣分,取评分最高者返回
|
||||
canidates_score = []
|
||||
# 给出一些判定模板文档的词作为扣分项
|
||||
unexpected_words = ['\LaTeX', 'manuscript', 'Guidelines', 'font', 'citations', 'rejected', 'blind review', 'reviewers']
|
||||
expected_words = ['\input', '\ref', '\cite']
|
||||
for texf in canidates:
|
||||
canidates_score.append(0)
|
||||
with open(texf, 'r', encoding='utf8') as f:
|
||||
file_content = f.read()
|
||||
for uw in unexpected_words:
|
||||
if uw in file_content:
|
||||
canidates_score[-1] -= 1
|
||||
for uw in expected_words:
|
||||
if uw in file_content:
|
||||
canidates_score[-1] += 1
|
||||
select = np.argmax(canidates_score) # 取评分最高者返回
|
||||
return canidates[select]
|
||||
|
||||
def rm_comments(main_file):
|
||||
new_file_remove_comment_lines = []
|
||||
for l in main_file.splitlines():
|
||||
@@ -132,6 +167,7 @@ def rm_comments(main_file):
|
||||
else:
|
||||
new_file_remove_comment_lines.append(l)
|
||||
main_file = '\n'.join(new_file_remove_comment_lines)
|
||||
# main_file = re.sub(r"\\include{(.*?)}", r"\\input{\1}", main_file) # 将 \include 命令转换为 \input 命令
|
||||
main_file = re.sub(r'(?<!\\)%.*', '', main_file) # 使用正则表达式查找半行注释, 并替换为空字符串
|
||||
return main_file
|
||||
|
||||
@@ -178,9 +214,11 @@ def merge_tex_files(project_foler, main_file, mode):
|
||||
main_file = re.sub(r"\\documentclass\[(.*?)\]{(.*?)}", r"\\documentclass[\1,fontset=windows,UTF8]{\2}",main_file)
|
||||
main_file = re.sub(r"\\documentclass{(.*?)}", r"\\documentclass[fontset=windows,UTF8]{\1}",main_file)
|
||||
# find paper abstract
|
||||
pattern = re.compile(r'\\begin\{abstract\}.*\n')
|
||||
match = pattern.search(main_file)
|
||||
assert match is not None, "Cannot find paper abstract section!"
|
||||
pattern_opt1 = re.compile(r'\\begin\{abstract\}.*\n')
|
||||
pattern_opt2 = re.compile(r"\\abstract\{(.*?)\}", flags=re.DOTALL)
|
||||
match_opt1 = pattern_opt1.search(main_file)
|
||||
match_opt2 = pattern_opt2.search(main_file)
|
||||
assert (match_opt1 is not None) or (match_opt2 is not None), "Cannot find paper abstract section!"
|
||||
return main_file
|
||||
|
||||
|
||||
@@ -212,6 +250,8 @@ def fix_content(final_tex, node_string):
|
||||
final_tex = re.sub(r"\\\ ([a-z]{2,10})\{", r"\\\1{", string=final_tex)
|
||||
final_tex = re.sub(r"\\([a-z]{2,10})\{([^\}]*?)\}", mod_inbraket, string=final_tex)
|
||||
|
||||
if "Traceback" in final_tex and "[Local Message]" in final_tex:
|
||||
final_tex = node_string # 出问题了,还原原文
|
||||
if node_string.count('\\begin') != final_tex.count('\\begin'):
|
||||
final_tex = node_string # 出问题了,还原原文
|
||||
if node_string.count('\_') > 0 and node_string.count('\_') > final_tex.count('\_'):
|
||||
@@ -259,45 +299,33 @@ def split_subprocess(txt, project_folder, return_dict, opts):
|
||||
mask = np.zeros(len(txt), dtype=np.uint8) + TRANSFORM
|
||||
|
||||
# 吸收title与作者以上的部分
|
||||
text, mask = split_worker(text, mask, r"(.*?)\\maketitle", re.DOTALL)
|
||||
# 删除iffalse注释
|
||||
text, mask = split_worker(text, mask, r"\\iffalse(.*?)\\fi", re.DOTALL)
|
||||
# 吸收在25行以内的begin-end组合
|
||||
text, mask = split_worker_begin_end(text, mask, r"\\begin\{([a-z\*]*)\}(.*?)\\end\{\1\}", re.DOTALL, limit_n_lines=25)
|
||||
text, mask = set_forbidden_text(text, mask, r"(.*?)\\maketitle", re.DOTALL)
|
||||
# 吸收iffalse注释
|
||||
text, mask = set_forbidden_text(text, mask, r"\\iffalse(.*?)\\fi", re.DOTALL)
|
||||
# 吸收在42行以内的begin-end组合
|
||||
text, mask = set_forbidden_text_begin_end(text, mask, r"\\begin\{([a-z\*]*)\}(.*?)\\end\{\1\}", re.DOTALL, limit_n_lines=42)
|
||||
# 吸收匿名公式
|
||||
text, mask = split_worker(text, mask, r"\$\$(.*?)\$\$", re.DOTALL)
|
||||
text, mask = set_forbidden_text(text, mask, [ r"\$\$(.*?)\$\$", r"\\\[.*?\\\]" ], re.DOTALL)
|
||||
# 吸收其他杂项
|
||||
text, mask = split_worker(text, mask, r"\\section\{(.*?)\}")
|
||||
text, mask = split_worker(text, mask, r"\\section\*\{(.*?)\}")
|
||||
text, mask = split_worker(text, mask, r"\\subsection\{(.*?)\}")
|
||||
text, mask = split_worker(text, mask, r"\\subsubsection\{(.*?)\}")
|
||||
text, mask = split_worker(text, mask, r"\\bibliography\{(.*?)\}")
|
||||
text, mask = split_worker(text, mask, r"\\bibliographystyle\{(.*?)\}")
|
||||
text, mask = split_worker(text, mask, r"\\begin\{lstlisting\}(.*?)\\end\{lstlisting\}", re.DOTALL)
|
||||
text, mask = split_worker(text, mask, r"\\begin\{wraptable\}(.*?)\\end\{wraptable\}", re.DOTALL)
|
||||
text, mask = split_worker(text, mask, r"\\begin\{algorithm\}(.*?)\\end\{algorithm\}", re.DOTALL)
|
||||
text, mask = split_worker(text, mask, r"\\begin\{wrapfigure\}(.*?)\\end\{wrapfigure\}", re.DOTALL)
|
||||
text, mask = split_worker(text, mask, r"\\begin\{wrapfigure\*\}(.*?)\\end\{wrapfigure\*\}", re.DOTALL)
|
||||
text, mask = split_worker(text, mask, r"\\begin\{figure\}(.*?)\\end\{figure\}", re.DOTALL)
|
||||
text, mask = split_worker(text, mask, r"\\begin\{figure\*\}(.*?)\\end\{figure\*\}", re.DOTALL)
|
||||
text, mask = split_worker(text, mask, r"\\begin\{multline\}(.*?)\\end\{multline\}", re.DOTALL)
|
||||
text, mask = split_worker(text, mask, r"\\begin\{multline\*\}(.*?)\\end\{multline\*\}", re.DOTALL)
|
||||
text, mask = split_worker(text, mask, r"\\begin\{table\}(.*?)\\end\{table\}", re.DOTALL)
|
||||
text, mask = split_worker(text, mask, r"\\begin\{table\*\}(.*?)\\end\{table\*\}", re.DOTALL)
|
||||
text, mask = split_worker(text, mask, r"\\begin\{minipage\}(.*?)\\end\{minipage\}", re.DOTALL)
|
||||
text, mask = split_worker(text, mask, r"\\begin\{minipage\*\}(.*?)\\end\{minipage\*\}", re.DOTALL)
|
||||
text, mask = split_worker(text, mask, r"\\begin\{align\*\}(.*?)\\end\{align\*\}", re.DOTALL)
|
||||
text, mask = split_worker(text, mask, r"\\begin\{align\}(.*?)\\end\{align\}", re.DOTALL)
|
||||
text, mask = split_worker(text, mask, r"\\begin\{equation\}(.*?)\\end\{equation\}", re.DOTALL)
|
||||
text, mask = split_worker(text, mask, r"\\begin\{equation\*\}(.*?)\\end\{equation\*\}", re.DOTALL)
|
||||
text, mask = split_worker(text, mask, r"\\item ")
|
||||
text, mask = split_worker(text, mask, r"\\label\{(.*?)\}")
|
||||
text, mask = split_worker(text, mask, r"\\begin\{(.*?)\}")
|
||||
text, mask = split_worker(text, mask, r"\\vspace\{(.*?)\}")
|
||||
text, mask = split_worker(text, mask, r"\\hspace\{(.*?)\}")
|
||||
text, mask = split_worker(text, mask, r"\\end\{(.*?)\}")
|
||||
text, mask = split_worker_careful_brace(text, mask, r"\\hl\{(.*?)\}", re.DOTALL)
|
||||
text, mask = split_worker_reverse_careful_brace(text, mask, r"\\caption\{(.*?)\}", re.DOTALL)
|
||||
text, mask = set_forbidden_text(text, mask, [ r"\\section\{(.*?)\}", r"\\section\*\{(.*?)\}", r"\\subsection\{(.*?)\}", r"\\subsubsection\{(.*?)\}" ])
|
||||
text, mask = set_forbidden_text(text, mask, [ r"\\bibliography\{(.*?)\}", r"\\bibliographystyle\{(.*?)\}" ])
|
||||
text, mask = set_forbidden_text(text, mask, r"\\begin\{thebibliography\}.*?\\end\{thebibliography\}", re.DOTALL)
|
||||
text, mask = set_forbidden_text(text, mask, r"\\begin\{lstlisting\}(.*?)\\end\{lstlisting\}", re.DOTALL)
|
||||
text, mask = set_forbidden_text(text, mask, r"\\begin\{wraptable\}(.*?)\\end\{wraptable\}", re.DOTALL)
|
||||
text, mask = set_forbidden_text(text, mask, r"\\begin\{algorithm\}(.*?)\\end\{algorithm\}", re.DOTALL)
|
||||
text, mask = set_forbidden_text(text, mask, [r"\\begin\{wrapfigure\}(.*?)\\end\{wrapfigure\}", r"\\begin\{wrapfigure\*\}(.*?)\\end\{wrapfigure\*\}"], re.DOTALL)
|
||||
text, mask = set_forbidden_text(text, mask, [r"\\begin\{figure\}(.*?)\\end\{figure\}", r"\\begin\{figure\*\}(.*?)\\end\{figure\*\}"], re.DOTALL)
|
||||
text, mask = set_forbidden_text(text, mask, [r"\\begin\{multline\}(.*?)\\end\{multline\}", r"\\begin\{multline\*\}(.*?)\\end\{multline\*\}"], re.DOTALL)
|
||||
text, mask = set_forbidden_text(text, mask, [r"\\begin\{table\}(.*?)\\end\{table\}", r"\\begin\{table\*\}(.*?)\\end\{table\*\}"], re.DOTALL)
|
||||
text, mask = set_forbidden_text(text, mask, [r"\\begin\{minipage\}(.*?)\\end\{minipage\}", r"\\begin\{minipage\*\}(.*?)\\end\{minipage\*\}"], re.DOTALL)
|
||||
text, mask = set_forbidden_text(text, mask, [r"\\begin\{align\*\}(.*?)\\end\{align\*\}", r"\\begin\{align\}(.*?)\\end\{align\}"], re.DOTALL)
|
||||
text, mask = set_forbidden_text(text, mask, [r"\\begin\{equation\}(.*?)\\end\{equation\}", r"\\begin\{equation\*\}(.*?)\\end\{equation\*\}"], re.DOTALL)
|
||||
text, mask = set_forbidden_text(text, mask, [r"\\includepdf\[(.*?)\]\{(.*?)\}", r"\\clearpage", r"\\newpage", r"\\appendix", r"\\tableofcontents", r"\\include\{(.*?)\}"])
|
||||
text, mask = set_forbidden_text(text, mask, [r"\\vspace\{(.*?)\}", r"\\hspace\{(.*?)\}", r"\\label\{(.*?)\}", r"\\begin\{(.*?)\}", r"\\end\{(.*?)\}", r"\\item "])
|
||||
text, mask = set_forbidden_text_careful_brace(text, mask, r"\\hl\{(.*?)\}", re.DOTALL)
|
||||
# reverse 操作必须放在最后
|
||||
text, mask = reverse_forbidden_text_careful_brace(text, mask, r"\\caption\{(.*?)\}", re.DOTALL, forbid_wrapper=True)
|
||||
text, mask = reverse_forbidden_text_careful_brace(text, mask, r"\\abstract\{(.*?)\}", re.DOTALL, forbid_wrapper=True)
|
||||
root = convert_to_linklist(text, mask)
|
||||
|
||||
# 修复括号
|
||||
@@ -371,7 +399,7 @@ def split_subprocess(txt, project_folder, return_dict, opts):
|
||||
prev_node = node
|
||||
node = node.next
|
||||
if node is None: break
|
||||
|
||||
# 输出html调试文件,用红色标注处保留区(PRESERVE),用黑色标注转换区(TRANSFORM)
|
||||
with open(pj(project_folder, 'debug_log.html'), 'w', encoding='utf8') as f:
|
||||
segment_parts_for_gpt = []
|
||||
nodes = []
|
||||
@@ -402,7 +430,7 @@ class LatexPaperSplit():
|
||||
"""
|
||||
def __init__(self) -> None:
|
||||
self.nodes = None
|
||||
self.msg = "{\\scriptsize\\textbf{警告:该PDF由GPT-Academic开源项目调用大语言模型+Latex翻译插件一键生成," + \
|
||||
self.msg = "*{\\scriptsize\\textbf{警告:该PDF由GPT-Academic开源项目调用大语言模型+Latex翻译插件一键生成," + \
|
||||
"版权归原文作者所有。翻译内容可靠性无保障,请仔细鉴别并以原文为准。" + \
|
||||
"项目Github地址 \\url{https://github.com/binary-husky/gpt_academic/}。"
|
||||
# 请您不要删除或修改这行警告,除非您是论文的原作者(如果您是论文原作者,欢迎加REAME中的QQ联系开发者)
|
||||
@@ -423,8 +451,14 @@ class LatexPaperSplit():
|
||||
if mode == 'translate_zh':
|
||||
pattern = re.compile(r'\\begin\{abstract\}.*\n')
|
||||
match = pattern.search(result_string)
|
||||
assert match is not None, "Cannot find paper abstract section!"
|
||||
position = match.end()
|
||||
if not match:
|
||||
# match \abstract{xxxx}
|
||||
pattern_compile = re.compile(r"\\abstract\{(.*?)\}", flags=re.DOTALL)
|
||||
match = pattern_compile.search(result_string)
|
||||
position = match.regs[1][0]
|
||||
else:
|
||||
# match \begin{abstract}xxxx\end{abstract}
|
||||
position = match.end()
|
||||
result_string = result_string[:position] + self.msg + msg + self.msg_declare + result_string[position:]
|
||||
return result_string
|
||||
|
||||
@@ -443,6 +477,7 @@ class LatexPaperSplit():
|
||||
args=(txt, project_folder, return_dict, opts))
|
||||
p.start()
|
||||
p.join()
|
||||
p.close()
|
||||
self.nodes = return_dict['nodes']
|
||||
self.sp = return_dict['segment_parts_for_gpt']
|
||||
return self.sp
|
||||
@@ -497,11 +532,11 @@ class LatexPaperFileGroup():
|
||||
f.write(res)
|
||||
return manifest
|
||||
|
||||
def write_html(sp_file_contents, sp_file_result, chatbot):
|
||||
def write_html(sp_file_contents, sp_file_result, chatbot, project_folder):
|
||||
|
||||
# write html
|
||||
try:
|
||||
import copy
|
||||
import shutil
|
||||
from .crazy_utils import construct_html
|
||||
from toolbox import gen_time_str
|
||||
ch = construct_html()
|
||||
@@ -519,6 +554,7 @@ def write_html(sp_file_contents, sp_file_result, chatbot):
|
||||
ch.add_row(a=orig, b=trans)
|
||||
create_report_file_name = f"{gen_time_str()}.trans.html"
|
||||
ch.save_file(create_report_file_name)
|
||||
shutil.copyfile(pj('./gpt_log/', create_report_file_name), pj(project_folder, create_report_file_name))
|
||||
promote_file_to_downloadzone(file=f'./gpt_log/{create_report_file_name}', chatbot=chatbot)
|
||||
except:
|
||||
from toolbox import trimmed_format_exc
|
||||
@@ -599,7 +635,7 @@ def Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin
|
||||
pfg.get_token_num = None
|
||||
objdump(pfg, file=pj(project_folder,'temp.pkl'))
|
||||
|
||||
write_html(pfg.sp_file_contents, pfg.sp_file_result, chatbot=chatbot)
|
||||
write_html(pfg.sp_file_contents, pfg.sp_file_result, chatbot=chatbot, project_folder=project_folder)
|
||||
|
||||
# <-------- 写出文件 ---------->
|
||||
msg = f"当前大语言模型: {llm_kwargs['llm_model']},当前语言模型温度设定: {llm_kwargs['temperature']}。"
|
||||
@@ -706,13 +742,15 @@ def 编译Latex(chatbot, history, main_file_original, main_file_modified, work_f
|
||||
results_ += f"对比PDF编译是否成功: {diff_pdf_success};"
|
||||
yield from update_ui_lastest_msg(f'第{n_fix}编译结束:<br/>{results_}...', chatbot, history) # 刷新Gradio前端界面
|
||||
|
||||
if diff_pdf_success:
|
||||
result_pdf = pj(work_folder_modified, f'merge_diff.pdf') # get pdf path
|
||||
promote_file_to_downloadzone(result_pdf, rename_file=None, chatbot=chatbot) # promote file to web UI
|
||||
if modified_pdf_success:
|
||||
yield from update_ui_lastest_msg(f'转化PDF编译已经成功, 即将退出 ...', chatbot, history) # 刷新Gradio前端界面
|
||||
os.chdir(current_dir)
|
||||
result_pdf = pj(work_folder_modified, f'{main_file_modified}.pdf')
|
||||
result_pdf = pj(work_folder_modified, f'{main_file_modified}.pdf') # get pdf path
|
||||
if os.path.exists(pj(work_folder, '..', 'translation')):
|
||||
shutil.copyfile(result_pdf, pj(work_folder, '..', 'translation', 'translate_zh.pdf'))
|
||||
promote_file_to_downloadzone(result_pdf, rename_file=None, chatbot=chatbot)
|
||||
promote_file_to_downloadzone(result_pdf, rename_file=None, chatbot=chatbot) # promote file to web UI
|
||||
return True # 成功啦
|
||||
else:
|
||||
if n_fix>=max_try: break
|
||||
|
||||
@@ -13,7 +13,9 @@ def 解析PDF(file_name, llm_kwargs, plugin_kwargs, chatbot, history, system_pro
|
||||
# 递归地切割PDF文件,每一块(尽量是完整的一个section,比如introduction,experiment等,必要时再进行切割)
|
||||
# 的长度必须小于 2500 个 Token
|
||||
file_content, page_one = read_and_clean_pdf_text(file_name) # (尝试)按照章节切割PDF
|
||||
|
||||
file_content = file_content.encode('utf-8', 'ignore').decode() # avoid reading non-utf8 chars
|
||||
page_one = str(page_one).encode('utf-8', 'ignore').decode() # avoid reading non-utf8 chars
|
||||
|
||||
TOKEN_LIMIT_PER_FRAGMENT = 2500
|
||||
|
||||
from .crazy_utils import breakdown_txt_to_satisfy_token_limit_for_pdf
|
||||
|
||||
@@ -103,3 +103,30 @@ services:
|
||||
echo '[jittorllms] 正在从github拉取最新代码...' &&
|
||||
git --git-dir=request_llm/jittorllms/.git --work-tree=request_llm/jittorllms pull --force &&
|
||||
python3 -u main.py"
|
||||
|
||||
|
||||
## ===================================================
|
||||
## 【方案四】 chatgpt + Latex
|
||||
## ===================================================
|
||||
version: '3'
|
||||
services:
|
||||
gpt_academic_with_latex:
|
||||
image: ghcr.io/binary-husky/gpt_academic_with_latex:master
|
||||
environment:
|
||||
# 请查阅 `config.py` 以查看所有的配置信息
|
||||
API_KEY: ' sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx '
|
||||
USE_PROXY: ' True '
|
||||
proxies: ' { "http": "socks5h://localhost:10880", "https": "socks5h://localhost:10880", } '
|
||||
LLM_MODEL: ' gpt-3.5-turbo '
|
||||
AVAIL_LLM_MODELS: ' ["gpt-3.5-turbo", "gpt-4"] '
|
||||
LOCAL_MODEL_DEVICE: ' cuda '
|
||||
DEFAULT_WORKER_NUM: ' 10 '
|
||||
WEB_PORT: ' 12303 '
|
||||
|
||||
# 与宿主的网络融合
|
||||
network_mode: "host"
|
||||
|
||||
# 不使用代理网络拉取最新代码
|
||||
command: >
|
||||
bash -c "python3 -u main.py"
|
||||
|
||||
|
||||
152
docs/use_azure.md
普通文件
152
docs/use_azure.md
普通文件
@@ -0,0 +1,152 @@
|
||||
# 通过微软Azure云服务申请 Openai API
|
||||
|
||||
由于Openai和微软的关系,现在是可以通过微软的Azure云计算服务直接访问openai的api,免去了注册和网络的问题。
|
||||
|
||||
快速入门的官方文档的链接是:[快速入门 - 开始通过 Azure OpenAI 服务使用 ChatGPT 和 GPT-4 - Azure OpenAI Service | Microsoft Learn](https://learn.microsoft.com/zh-cn/azure/cognitive-services/openai/chatgpt-quickstart?pivots=programming-language-python)
|
||||
|
||||
# 申请API
|
||||
|
||||
按文档中的“先决条件”的介绍,出了编程的环境以外,还需要以下三个条件:
|
||||
|
||||
1. Azure账号并创建订阅
|
||||
|
||||
2. 为订阅添加Azure OpenAI 服务
|
||||
|
||||
3. 部署模型
|
||||
|
||||
## Azure账号并创建订阅
|
||||
|
||||
### Azure账号
|
||||
|
||||
创建Azure的账号时最好是有微软的账号,这样似乎更容易获得免费额度(第一个月的200美元,实测了一下,如果用一个刚注册的微软账号登录Azure的话,并没有这一个月的免费额度)。
|
||||
|
||||
创建Azure账号的网址是:[立即创建 Azure 免费帐户 | Microsoft Azure](https://azure.microsoft.com/zh-cn/free/)
|
||||
|
||||

|
||||
|
||||
打开网页后,点击 “免费开始使用” 会跳转到登录或注册页面,如果有微软的账户,直接登录即可,如果没有微软账户,那就需要到微软的网页再另行注册一个。
|
||||
|
||||
注意,Azure的页面和政策时不时会变化,已实际最新显示的为准就好。
|
||||
|
||||
### 创建订阅
|
||||
|
||||
注册好Azure后便可进入主页:
|
||||
|
||||

|
||||
|
||||
首先需要在订阅里进行添加操作,点开后即可进入订阅的页面:
|
||||
|
||||

|
||||
|
||||
第一次进来应该是空的,点添加即可创建新的订阅(可以是“免费”或者“即付即用”的订阅),其中订阅ID是后面申请Azure OpenAI需要使用的。
|
||||
|
||||
## 为订阅添加Azure OpenAI服务
|
||||
|
||||
之后回到首页,点Azure OpenAI即可进入OpenAI服务的页面(如果不显示的话,则在首页上方的搜索栏里搜索“openai”即可)。
|
||||
|
||||

|
||||
|
||||
不过现在这个服务还不能用。在使用前,还需要在这个网址申请一下:
|
||||
|
||||
[Request Access to Azure OpenAI Service (microsoft.com)](https://customervoice.microsoft.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR7en2Ais5pxKtso_Pz4b1_xUOFA5Qk1UWDRBMjg0WFhPMkIzTzhKQ1dWNyQlQCN0PWcu)
|
||||
|
||||
这里有二十来个问题,按照要求和自己的实际情况填写即可。
|
||||
|
||||
其中需要注意的是
|
||||
|
||||
1. 千万记得填对"订阅ID"
|
||||
|
||||
2. 需要填一个公司邮箱(可以不是注册用的邮箱)和公司网址
|
||||
|
||||
之后,在回到上面那个页面,点创建,就会进入创建页面了:
|
||||
|
||||

|
||||
|
||||
需要填入“资源组”和“名称”,按照自己的需要填入即可。
|
||||
|
||||
完成后,在主页的“资源”里就可以看到刚才创建的“资源”了,点击进入后,就可以进行最后的部署了。
|
||||
|
||||

|
||||
|
||||
## 部署模型
|
||||
|
||||
进入资源页面后,在部署模型前,可以先点击“开发”,把密钥和终结点记下来。
|
||||
|
||||

|
||||
|
||||
之后,就可以去部署模型了,点击“部署”即可,会跳转到 Azure OpenAI Stuido 进行下面的操作:
|
||||
|
||||

|
||||
|
||||
进入 Azure OpenAi Studio 后,点击新建部署,会弹出如下对话框:
|
||||
|
||||

|
||||
|
||||
在这里选 gpt-35-turbo 或需要的模型并按需要填入“部署名”即可完成模型的部署。
|
||||
|
||||

|
||||
|
||||
这个部署名需要记下来。
|
||||
|
||||
到现在为止,申请操作就完成了,需要记下来的有下面几个东西:
|
||||
|
||||
● 密钥(1或2都可以)
|
||||
|
||||
● 终结点
|
||||
|
||||
● 部署名(不是模型名)
|
||||
|
||||
# 修改 config.py
|
||||
|
||||
```
|
||||
AZURE_ENDPOINT = "填入终结点"
|
||||
AZURE_API_KEY = "填入azure openai api的密钥"
|
||||
AZURE_API_VERSION = "2023-05-15" # 默认使用 2023-05-15 版本,无需修改
|
||||
AZURE_ENGINE = "填入部署名"
|
||||
|
||||
```
|
||||
# API的使用
|
||||
|
||||
接下来就是具体怎么使用API了,还是可以参考官方文档:[快速入门 - 开始通过 Azure OpenAI 服务使用 ChatGPT 和 GPT-4 - Azure OpenAI Service | Microsoft Learn](https://learn.microsoft.com/zh-cn/azure/cognitive-services/openai/chatgpt-quickstart?pivots=programming-language-python)
|
||||
|
||||
和openai自己的api调用有点类似,都需要安装openai库,不同的是调用方式
|
||||
|
||||
```
|
||||
import openai
|
||||
openai.api_type = "azure" #固定格式,无需修改
|
||||
openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT") #这里填入“终结点”
|
||||
openai.api_version = "2023-05-15" #固定格式,无需修改
|
||||
openai.api_key = os.getenv("AZURE_OPENAI_KEY") #这里填入“密钥1”或“密钥2”
|
||||
|
||||
response = openai.ChatCompletion.create(
|
||||
engine="gpt-35-turbo", #这里填入的不是模型名,是部署名
|
||||
messages=[
|
||||
{"role": "system", "content": "You are a helpful assistant."},
|
||||
{"role": "user", "content": "Does Azure OpenAI support customer managed keys?"},
|
||||
{"role": "assistant", "content": "Yes, customer managed keys are supported by Azure OpenAI."},
|
||||
{"role": "user", "content": "Do other Azure Cognitive Services support this too?"}
|
||||
]
|
||||
)
|
||||
|
||||
print(response)
|
||||
print(response['choices'][0]['message']['content'])
|
||||
|
||||
```
|
||||
|
||||
需要注意的是:
|
||||
|
||||
1. engine那里填入的是部署名,不是模型名
|
||||
|
||||
2. 通过openai库获得的这个 response 和通过 request 库访问 url 获得的 response 不同,不需要 decode,已经是解析好的 json 了,直接根据键值读取即可。
|
||||
|
||||
更细节的使用方法,详见官方API文档。
|
||||
|
||||
# 关于费用
|
||||
|
||||
Azure OpenAI API 还是需要一些费用的(免费订阅只有1个月有效期),费用如下:
|
||||
|
||||

|
||||
|
||||
具体可以可以看这个网址 :[Azure OpenAI 服务 - 定价| Microsoft Azure](https://azure.microsoft.com/zh-cn/pricing/details/cognitive-services/openai-service/?cdn=disable)
|
||||
|
||||
并非网上说的什么“一年白嫖”,但注册方法以及网络问题都比直接使用openai的api要简单一些。
|
||||
@@ -16,6 +16,9 @@ from toolbox import get_conf, trimmed_format_exc
|
||||
from .bridge_chatgpt import predict_no_ui_long_connection as chatgpt_noui
|
||||
from .bridge_chatgpt import predict as chatgpt_ui
|
||||
|
||||
from .bridge_azure_test import predict_no_ui_long_connection as azure_noui
|
||||
from .bridge_azure_test import predict as azure_ui
|
||||
|
||||
from .bridge_chatglm import predict_no_ui_long_connection as chatglm_noui
|
||||
from .bridge_chatglm import predict as chatglm_ui
|
||||
|
||||
@@ -93,6 +96,24 @@ model_info = {
|
||||
"token_cnt": get_token_num_gpt35,
|
||||
},
|
||||
|
||||
"gpt-3.5-turbo-0613": {
|
||||
"fn_with_ui": chatgpt_ui,
|
||||
"fn_without_ui": chatgpt_noui,
|
||||
"endpoint": openai_endpoint,
|
||||
"max_token": 4096,
|
||||
"tokenizer": tokenizer_gpt35,
|
||||
"token_cnt": get_token_num_gpt35,
|
||||
},
|
||||
|
||||
"gpt-3.5-turbo-16k-0613": {
|
||||
"fn_with_ui": chatgpt_ui,
|
||||
"fn_without_ui": chatgpt_noui,
|
||||
"endpoint": openai_endpoint,
|
||||
"max_token": 1024 * 16,
|
||||
"tokenizer": tokenizer_gpt35,
|
||||
"token_cnt": get_token_num_gpt35,
|
||||
},
|
||||
|
||||
"gpt-4": {
|
||||
"fn_with_ui": chatgpt_ui,
|
||||
"fn_without_ui": chatgpt_noui,
|
||||
@@ -102,6 +123,16 @@ model_info = {
|
||||
"token_cnt": get_token_num_gpt4,
|
||||
},
|
||||
|
||||
# azure openai
|
||||
"azure-gpt35":{
|
||||
"fn_with_ui": azure_ui,
|
||||
"fn_without_ui": azure_noui,
|
||||
"endpoint": get_conf("AZURE_ENDPOINT"),
|
||||
"max_token": 4096,
|
||||
"tokenizer": tokenizer_gpt35,
|
||||
"token_cnt": get_token_num_gpt35,
|
||||
},
|
||||
|
||||
# api_2d
|
||||
"api2d-gpt-3.5-turbo": {
|
||||
"fn_with_ui": chatgpt_ui,
|
||||
|
||||
241
request_llm/bridge_azure_test.py
普通文件
241
request_llm/bridge_azure_test.py
普通文件
@@ -0,0 +1,241 @@
|
||||
"""
|
||||
该文件中主要包含三个函数
|
||||
|
||||
不具备多线程能力的函数:
|
||||
1. predict: 正常对话时使用,具备完备的交互功能,不可多线程
|
||||
|
||||
具备多线程调用能力的函数
|
||||
2. predict_no_ui:高级实验性功能模块调用,不会实时显示在界面上,参数简单,可以多线程并行,方便实现复杂的功能逻辑
|
||||
3. predict_no_ui_long_connection:在实验过程中发现调用predict_no_ui处理长文档时,和openai的连接容易断掉,这个函数用stream的方式解决这个问题,同样支持多线程
|
||||
"""
|
||||
|
||||
import logging
|
||||
import traceback
|
||||
import importlib
|
||||
import openai
|
||||
import time
|
||||
|
||||
|
||||
# 读取config.py文件中关于AZURE OPENAI API的信息
|
||||
from toolbox import get_conf, update_ui, clip_history, trimmed_format_exc
|
||||
TIMEOUT_SECONDS, MAX_RETRY, AZURE_ENGINE, AZURE_ENDPOINT, AZURE_API_VERSION, AZURE_API_KEY = \
|
||||
get_conf('TIMEOUT_SECONDS', 'MAX_RETRY',"AZURE_ENGINE","AZURE_ENDPOINT", "AZURE_API_VERSION", "AZURE_API_KEY")
|
||||
|
||||
|
||||
def get_full_error(chunk, stream_response):
|
||||
"""
|
||||
获取完整的从Openai返回的报错
|
||||
"""
|
||||
while True:
|
||||
try:
|
||||
chunk += next(stream_response)
|
||||
except:
|
||||
break
|
||||
return chunk
|
||||
|
||||
def predict(inputs, llm_kwargs, plugin_kwargs, chatbot, history=[], system_prompt='', stream = True, additional_fn=None):
|
||||
"""
|
||||
发送至azure openai api,流式获取输出。
|
||||
用于基础的对话功能。
|
||||
inputs 是本次问询的输入
|
||||
top_p, temperature是chatGPT的内部调优参数
|
||||
history 是之前的对话列表(注意无论是inputs还是history,内容太长了都会触发token数量溢出的错误)
|
||||
chatbot 为WebUI中显示的对话列表,修改它,然后yeild出去,可以直接修改对话界面内容
|
||||
additional_fn代表点击的哪个按钮,按钮见functional.py
|
||||
"""
|
||||
print(llm_kwargs["llm_model"])
|
||||
|
||||
if additional_fn is not None:
|
||||
import core_functional
|
||||
importlib.reload(core_functional) # 热更新prompt
|
||||
core_functional = core_functional.get_core_functions()
|
||||
if "PreProcess" in core_functional[additional_fn]: inputs = core_functional[additional_fn]["PreProcess"](inputs) # 获取预处理函数(如果有的话)
|
||||
inputs = core_functional[additional_fn]["Prefix"] + inputs + core_functional[additional_fn]["Suffix"]
|
||||
|
||||
raw_input = inputs
|
||||
logging.info(f'[raw_input] {raw_input}')
|
||||
chatbot.append((inputs, ""))
|
||||
yield from update_ui(chatbot=chatbot, history=history, msg="等待响应") # 刷新界面
|
||||
|
||||
|
||||
payload = generate_azure_payload(inputs, llm_kwargs, history, system_prompt, stream)
|
||||
|
||||
history.append(inputs); history.append("")
|
||||
|
||||
retry = 0
|
||||
while True:
|
||||
try:
|
||||
|
||||
openai.api_type = "azure"
|
||||
openai.api_version = AZURE_API_VERSION
|
||||
openai.api_base = AZURE_ENDPOINT
|
||||
openai.api_key = AZURE_API_KEY
|
||||
response = openai.ChatCompletion.create(timeout=TIMEOUT_SECONDS, **payload);break
|
||||
|
||||
except:
|
||||
retry += 1
|
||||
chatbot[-1] = ((chatbot[-1][0], "获取response失败,重试中。。。"))
|
||||
retry_msg = f",正在重试 ({retry}/{MAX_RETRY}) ……" if MAX_RETRY > 0 else ""
|
||||
yield from update_ui(chatbot=chatbot, history=history, msg="请求超时"+retry_msg) # 刷新界面
|
||||
if retry > MAX_RETRY: raise TimeoutError
|
||||
|
||||
gpt_replying_buffer = ""
|
||||
is_head_of_the_stream = True
|
||||
if stream:
|
||||
|
||||
stream_response = response
|
||||
|
||||
while True:
|
||||
try:
|
||||
chunk = next(stream_response)
|
||||
|
||||
except StopIteration:
|
||||
from toolbox import regular_txt_to_markdown; tb_str = '```\n' + trimmed_format_exc() + '```'
|
||||
chatbot[-1] = (chatbot[-1][0], f"[Local Message] 远程返回错误: \n\n{tb_str} \n\n{regular_txt_to_markdown(chunk)}")
|
||||
yield from update_ui(chatbot=chatbot, history=history, msg="远程返回错误:" + chunk) # 刷新界面
|
||||
return
|
||||
|
||||
if is_head_of_the_stream and (r'"object":"error"' not in chunk):
|
||||
# 数据流的第一帧不携带content
|
||||
is_head_of_the_stream = False; continue
|
||||
|
||||
if chunk:
|
||||
#print(chunk)
|
||||
try:
|
||||
if "delta" in chunk["choices"][0]:
|
||||
if chunk["choices"][0]["finish_reason"] == "stop":
|
||||
logging.info(f'[response] {gpt_replying_buffer}')
|
||||
break
|
||||
status_text = f"finish_reason: {chunk['choices'][0]['finish_reason']}"
|
||||
gpt_replying_buffer = gpt_replying_buffer + chunk["choices"][0]["delta"]["content"]
|
||||
|
||||
history[-1] = gpt_replying_buffer
|
||||
chatbot[-1] = (history[-2], history[-1])
|
||||
yield from update_ui(chatbot=chatbot, history=history, msg=status_text) # 刷新界面
|
||||
|
||||
except Exception as e:
|
||||
traceback.print_exc()
|
||||
yield from update_ui(chatbot=chatbot, history=history, msg="Json解析不合常规") # 刷新界面
|
||||
chunk = get_full_error(chunk, stream_response)
|
||||
|
||||
error_msg = chunk
|
||||
yield from update_ui(chatbot=chatbot, history=history, msg="Json异常" + error_msg) # 刷新界面
|
||||
return
|
||||
|
||||
|
||||
def predict_no_ui_long_connection(inputs, llm_kwargs, history=[], sys_prompt="", observe_window=None, console_slience=False):
|
||||
"""
|
||||
发送至AZURE OPENAI API,等待回复,一次性完成,不显示中间过程。但内部用stream的方法避免中途网线被掐。
|
||||
inputs:
|
||||
是本次问询的输入
|
||||
sys_prompt:
|
||||
系统静默prompt
|
||||
llm_kwargs:
|
||||
chatGPT的内部调优参数
|
||||
history:
|
||||
是之前的对话列表
|
||||
observe_window = None:
|
||||
用于负责跨越线程传递已经输出的部分,大部分时候仅仅为了fancy的视觉效果,留空即可。observe_window[0]:观测窗。observe_window[1]:看门狗
|
||||
"""
|
||||
watch_dog_patience = 5 # 看门狗的耐心, 设置5秒即可
|
||||
payload = generate_azure_payload(inputs, llm_kwargs, history, system_prompt=sys_prompt, stream=True)
|
||||
retry = 0
|
||||
while True:
|
||||
|
||||
try:
|
||||
openai.api_type = "azure"
|
||||
openai.api_version = AZURE_API_VERSION
|
||||
openai.api_base = AZURE_ENDPOINT
|
||||
openai.api_key = AZURE_API_KEY
|
||||
response = openai.ChatCompletion.create(timeout=TIMEOUT_SECONDS, **payload);break
|
||||
|
||||
except:
|
||||
retry += 1
|
||||
traceback.print_exc()
|
||||
if retry > MAX_RETRY: raise TimeoutError
|
||||
if MAX_RETRY!=0: print(f'请求超时,正在重试 ({retry}/{MAX_RETRY}) ……')
|
||||
|
||||
|
||||
stream_response = response
|
||||
result = ''
|
||||
while True:
|
||||
try: chunk = next(stream_response)
|
||||
except StopIteration:
|
||||
break
|
||||
except:
|
||||
chunk = next(stream_response) # 失败了,重试一次?再失败就没办法了。
|
||||
|
||||
if len(chunk)==0: continue
|
||||
if not chunk.startswith('data:'):
|
||||
error_msg = get_full_error(chunk, stream_response)
|
||||
if "reduce the length" in error_msg:
|
||||
raise ConnectionAbortedError("AZURE OPENAI API拒绝了请求:" + error_msg)
|
||||
else:
|
||||
raise RuntimeError("AZURE OPENAI API拒绝了请求:" + error_msg)
|
||||
if ('data: [DONE]' in chunk): break
|
||||
|
||||
delta = chunk["delta"]
|
||||
if len(delta) == 0: break
|
||||
if "role" in delta: continue
|
||||
if "content" in delta:
|
||||
result += delta["content"]
|
||||
if not console_slience: print(delta["content"], end='')
|
||||
if observe_window is not None:
|
||||
# 观测窗,把已经获取的数据显示出去
|
||||
if len(observe_window) >= 1: observe_window[0] += delta["content"]
|
||||
# 看门狗,如果超过期限没有喂狗,则终止
|
||||
if len(observe_window) >= 2:
|
||||
if (time.time()-observe_window[1]) > watch_dog_patience:
|
||||
raise RuntimeError("用户取消了程序。")
|
||||
else: raise RuntimeError("意外Json结构:"+delta)
|
||||
if chunk['finish_reason'] == 'length':
|
||||
raise ConnectionAbortedError("正常结束,但显示Token不足,导致输出不完整,请削减单次输入的文本量。")
|
||||
return result
|
||||
|
||||
|
||||
def generate_azure_payload(inputs, llm_kwargs, history, system_prompt, stream):
|
||||
"""
|
||||
整合所有信息,选择LLM模型,生成 azure openai api请求,为发送请求做准备
|
||||
"""
|
||||
|
||||
conversation_cnt = len(history) // 2
|
||||
|
||||
messages = [{"role": "system", "content": system_prompt}]
|
||||
if conversation_cnt:
|
||||
for index in range(0, 2*conversation_cnt, 2):
|
||||
what_i_have_asked = {}
|
||||
what_i_have_asked["role"] = "user"
|
||||
what_i_have_asked["content"] = history[index]
|
||||
what_gpt_answer = {}
|
||||
what_gpt_answer["role"] = "assistant"
|
||||
what_gpt_answer["content"] = history[index+1]
|
||||
if what_i_have_asked["content"] != "":
|
||||
if what_gpt_answer["content"] == "": continue
|
||||
messages.append(what_i_have_asked)
|
||||
messages.append(what_gpt_answer)
|
||||
else:
|
||||
messages[-1]['content'] = what_gpt_answer['content']
|
||||
|
||||
what_i_ask_now = {}
|
||||
what_i_ask_now["role"] = "user"
|
||||
what_i_ask_now["content"] = inputs
|
||||
messages.append(what_i_ask_now)
|
||||
|
||||
payload = {
|
||||
"model": llm_kwargs['llm_model'],
|
||||
"messages": messages,
|
||||
"temperature": llm_kwargs['temperature'], # 1.0,
|
||||
"top_p": llm_kwargs['top_p'], # 1.0,
|
||||
"n": 1,
|
||||
"stream": stream,
|
||||
"presence_penalty": 0,
|
||||
"frequency_penalty": 0,
|
||||
"engine": AZURE_ENGINE
|
||||
}
|
||||
try:
|
||||
print(f" {llm_kwargs['llm_model']} : {conversation_cnt} : {inputs[:100]} ..........")
|
||||
except:
|
||||
print('输入中可能存在乱码。')
|
||||
return payload
|
||||
|
||||
|
||||
4
version
4
version
@@ -1,5 +1,5 @@
|
||||
{
|
||||
"version": 3.41,
|
||||
"version": 3.42,
|
||||
"show_feature": true,
|
||||
"new_feature": "增加gpt-3.5-16k的支持 <-> 新增最强Arxiv论文翻译插件 <-> 修复gradio复制按钮BUG <-> 修复PDF翻译的BUG, 新增HTML中英双栏对照 <-> 添加了OpenAI图片生成插件 <-> 添加了OpenAI音频转文本总结插件 <-> 通过Slack添加对Claude的支持"
|
||||
"new_feature": "完善本地Latex矫错和翻译功能 <-> 增加gpt-3.5-16k的支持 <-> 新增最强Arxiv论文翻译插件 <-> 修复gradio复制按钮BUG <-> 修复PDF翻译的BUG, 新增HTML中英双栏对照 <-> 添加了OpenAI图片生成插件 <-> 添加了OpenAI音频转文本总结插件 <-> 通过Slack添加对Claude的支持"
|
||||
}
|
||||
|
||||
在新工单中引用
屏蔽一个用户