镜像自地址
https://github.com/binary-husky/gpt_academic.git
已同步 2025-12-06 06:26:47 +00:00
比较提交
69 次代码提交
version3.4
...
version3.4
| 作者 | SHA1 | 提交日期 | |
|---|---|---|---|
|
|
49253c4dc6 | ||
|
|
1a00093015 | ||
|
|
64f76e7401 | ||
|
|
eb4c07997e | ||
|
|
d684b4cdb3 | ||
|
|
601a95c948 | ||
|
|
e18bef2e9c | ||
|
|
f654c1af31 | ||
|
|
e90048a671 | ||
|
|
ea624b1510 | ||
|
|
057e3dda3c | ||
|
|
4290821a50 | ||
|
|
280e14d7b7 | ||
|
|
9f0cf9fb2b | ||
|
|
b8560b7510 | ||
|
|
d841d13b04 | ||
|
|
efda9e5193 | ||
|
|
33d2e75aac | ||
|
|
74941170aa | ||
|
|
cd38949903 | ||
|
|
d87f1eb171 | ||
|
|
cd1e4e1ba7 | ||
|
|
cf5f348d70 | ||
|
|
0ee25f475e | ||
|
|
1fede6df7f | ||
|
|
22a65cd163 | ||
|
|
538b041ea3 | ||
|
|
d7b056576d | ||
|
|
cb0bb6ab4a | ||
|
|
bf955aaf12 | ||
|
|
61eb0da861 | ||
|
|
5da633d94d | ||
|
|
f3e4e26e2f | ||
|
|
af7734dd35 | ||
|
|
d5bab093f9 | ||
|
|
f94b167dc2 | ||
|
|
951d5ec758 | ||
|
|
016d8ee156 | ||
|
|
dca9ec4bae | ||
|
|
a06e43c96b | ||
|
|
29c6bfb6cb | ||
|
|
8d7ee975a0 | ||
|
|
4bafbb3562 | ||
|
|
7fdf0a8e51 | ||
|
|
2bb13b4677 | ||
|
|
9a5a509dd9 | ||
|
|
cbcb98ef6a | ||
|
|
bb864c6313 | ||
|
|
6d849eeb12 | ||
|
|
ef752838b0 | ||
|
|
73d4a1ff4b | ||
|
|
8c62f21aa6 | ||
|
|
c40ebfc21f | ||
|
|
c365ea9f57 | ||
|
|
12d66777cc | ||
|
|
9ac3d0d65d | ||
|
|
9fd212652e | ||
|
|
790a1cf12a | ||
|
|
3ecf2977a8 | ||
|
|
aeddf6b461 | ||
|
|
ce0d8b9dab | ||
|
|
3c00e7a143 | ||
|
|
ef1bfdd60f | ||
|
|
e48d92e82e | ||
|
|
110510997f | ||
|
|
b52695845e | ||
|
|
f30c9c6d3b | ||
|
|
ff5403eac6 | ||
|
|
f3205994ea |
44
.github/workflows/build-with-latex.yml
vendored
普通文件
44
.github/workflows/build-with-latex.yml
vendored
普通文件
@@ -0,0 +1,44 @@
|
|||||||
|
# https://docs.github.com/en/actions/publishing-packages/publishing-docker-images#publishing-images-to-github-packages
|
||||||
|
name: Create and publish a Docker image for Latex support
|
||||||
|
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
branches:
|
||||||
|
- 'master'
|
||||||
|
|
||||||
|
env:
|
||||||
|
REGISTRY: ghcr.io
|
||||||
|
IMAGE_NAME: ${{ github.repository }}_with_latex
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
build-and-push-image:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
permissions:
|
||||||
|
contents: read
|
||||||
|
packages: write
|
||||||
|
|
||||||
|
steps:
|
||||||
|
- name: Checkout repository
|
||||||
|
uses: actions/checkout@v3
|
||||||
|
|
||||||
|
- name: Log in to the Container registry
|
||||||
|
uses: docker/login-action@v2
|
||||||
|
with:
|
||||||
|
registry: ${{ env.REGISTRY }}
|
||||||
|
username: ${{ github.actor }}
|
||||||
|
password: ${{ secrets.GITHUB_TOKEN }}
|
||||||
|
|
||||||
|
- name: Extract metadata (tags, labels) for Docker
|
||||||
|
id: meta
|
||||||
|
uses: docker/metadata-action@v4
|
||||||
|
with:
|
||||||
|
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
|
||||||
|
|
||||||
|
- name: Build and push Docker image
|
||||||
|
uses: docker/build-push-action@v4
|
||||||
|
with:
|
||||||
|
context: .
|
||||||
|
push: true
|
||||||
|
file: docs/GithubAction+NoLocal+Latex
|
||||||
|
tags: ${{ steps.meta.outputs.tags }}
|
||||||
|
labels: ${{ steps.meta.outputs.labels }}
|
||||||
129
README.md
129
README.md
@@ -16,7 +16,7 @@ To translate this project to arbitary language with GPT, read and run [`multi_la
|
|||||||
>
|
>
|
||||||
> 1.请注意只有**红颜色**标识的函数插件(按钮)才支持读取文件,部分插件位于插件区的**下拉菜单**中。另外我们以**最高优先级**欢迎和处理任何新插件的PR!
|
> 1.请注意只有**红颜色**标识的函数插件(按钮)才支持读取文件,部分插件位于插件区的**下拉菜单**中。另外我们以**最高优先级**欢迎和处理任何新插件的PR!
|
||||||
>
|
>
|
||||||
> 2.本项目中每个文件的功能都在自译解[`self_analysis.md`](https://github.com/binary-husky/chatgpt_academic/wiki/chatgpt-academic%E9%A1%B9%E7%9B%AE%E8%87%AA%E8%AF%91%E8%A7%A3%E6%8A%A5%E5%91%8A)详细说明。随着版本的迭代,您也可以随时自行点击相关函数插件,调用GPT重新生成项目的自我解析报告。常见问题汇总在[`wiki`](https://github.com/binary-husky/chatgpt_academic/wiki/%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98)当中。[安装方法](#installation)。
|
> 2.本项目中每个文件的功能都在自译解[`self_analysis.md`](https://github.com/binary-husky/gpt_academic/wiki/chatgpt-academic%E9%A1%B9%E7%9B%AE%E8%87%AA%E8%AF%91%E8%A7%A3%E6%8A%A5%E5%91%8A)详细说明。随着版本的迭代,您也可以随时自行点击相关函数插件,调用GPT重新生成项目的自我解析报告。常见问题汇总在[`wiki`](https://github.com/binary-husky/gpt_academic/wiki/%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98)当中。[安装方法](#installation)。
|
||||||
>
|
>
|
||||||
> 3.本项目兼容并鼓励尝试国产大语言模型chatglm和RWKV, 盘古等等。支持多个api-key共存,可在配置文件中填写如`API_KEY="openai-key1,openai-key2,api2d-key3"`。需要临时更换`API_KEY`时,在输入区输入临时的`API_KEY`然后回车键提交后即可生效。
|
> 3.本项目兼容并鼓励尝试国产大语言模型chatglm和RWKV, 盘古等等。支持多个api-key共存,可在配置文件中填写如`API_KEY="openai-key1,openai-key2,api2d-key3"`。需要临时更换`API_KEY`时,在输入区输入临时的`API_KEY`然后回车键提交后即可生效。
|
||||||
|
|
||||||
@@ -31,23 +31,23 @@ To translate this project to arbitary language with GPT, read and run [`multi_la
|
|||||||
一键中英互译 | 一键中英互译
|
一键中英互译 | 一键中英互译
|
||||||
一键代码解释 | 显示代码、解释代码、生成代码、给代码加注释
|
一键代码解释 | 显示代码、解释代码、生成代码、给代码加注释
|
||||||
[自定义快捷键](https://www.bilibili.com/video/BV14s4y1E7jN) | 支持自定义快捷键
|
[自定义快捷键](https://www.bilibili.com/video/BV14s4y1E7jN) | 支持自定义快捷键
|
||||||
模块化设计 | 支持自定义强大的[函数插件](https://github.com/binary-husky/chatgpt_academic/tree/master/crazy_functions),插件支持[热更新](https://github.com/binary-husky/chatgpt_academic/wiki/%E5%87%BD%E6%95%B0%E6%8F%92%E4%BB%B6%E6%8C%87%E5%8D%97)
|
模块化设计 | 支持自定义强大的[函数插件](https://github.com/binary-husky/gpt_academic/tree/master/crazy_functions),插件支持[热更新](https://github.com/binary-husky/gpt_academic/wiki/%E5%87%BD%E6%95%B0%E6%8F%92%E4%BB%B6%E6%8C%87%E5%8D%97)
|
||||||
[自我程序剖析](https://www.bilibili.com/video/BV1cj411A7VW) | [函数插件] [一键读懂](https://github.com/binary-husky/chatgpt_academic/wiki/chatgpt-academic%E9%A1%B9%E7%9B%AE%E8%87%AA%E8%AF%91%E8%A7%A3%E6%8A%A5%E5%91%8A)本项目的源代码
|
[自我程序剖析](https://www.bilibili.com/video/BV1cj411A7VW) | [函数插件] [一键读懂](https://github.com/binary-husky/gpt_academic/wiki/chatgpt-academic%E9%A1%B9%E7%9B%AE%E8%87%AA%E8%AF%91%E8%A7%A3%E6%8A%A5%E5%91%8A)本项目的源代码
|
||||||
[程序剖析](https://www.bilibili.com/video/BV1cj411A7VW) | [函数插件] 一键可以剖析其他Python/C/C++/Java/Lua/...项目树
|
[程序剖析](https://www.bilibili.com/video/BV1cj411A7VW) | [函数插件] 一键可以剖析其他Python/C/C++/Java/Lua/...项目树
|
||||||
读论文、[翻译](https://www.bilibili.com/video/BV1KT411x7Wn)论文 | [函数插件] 一键解读latex/pdf论文全文并生成摘要
|
读论文、[翻译](https://www.bilibili.com/video/BV1KT411x7Wn)论文 | [函数插件] 一键解读latex/pdf论文全文并生成摘要
|
||||||
Latex全文[翻译](https://www.bilibili.com/video/BV1nk4y1Y7Js/)、[润色](https://www.bilibili.com/video/BV1FT411H7c5/) | [函数插件] 一键翻译或润色latex论文
|
Latex全文[翻译](https://www.bilibili.com/video/BV1nk4y1Y7Js/)、[润色](https://www.bilibili.com/video/BV1FT411H7c5/) | [函数插件] 一键翻译或润色latex论文
|
||||||
批量注释生成 | [函数插件] 一键批量生成函数注释
|
批量注释生成 | [函数插件] 一键批量生成函数注释
|
||||||
Markdown[中英互译](https://www.bilibili.com/video/BV1yo4y157jV/) | [函数插件] 看到上面5种语言的[README](https://github.com/binary-husky/chatgpt_academic/blob/master/docs/README_EN.md)了吗?
|
Markdown[中英互译](https://www.bilibili.com/video/BV1yo4y157jV/) | [函数插件] 看到上面5种语言的[README](https://github.com/binary-husky/gpt_academic/blob/master/docs/README_EN.md)了吗?
|
||||||
chat分析报告生成 | [函数插件] 运行后自动生成总结汇报
|
chat分析报告生成 | [函数插件] 运行后自动生成总结汇报
|
||||||
[PDF论文全文翻译功能](https://www.bilibili.com/video/BV1KT411x7Wn) | [函数插件] PDF论文提取题目&摘要+翻译全文(多线程)
|
[PDF论文全文翻译功能](https://www.bilibili.com/video/BV1KT411x7Wn) | [函数插件] PDF论文提取题目&摘要+翻译全文(多线程)
|
||||||
[Arxiv小助手](https://www.bilibili.com/video/BV1LM4y1279X) | [函数插件] 输入arxiv文章url即可一键翻译摘要+下载PDF
|
[Arxiv小助手](https://www.bilibili.com/video/BV1LM4y1279X) | [函数插件] 输入arxiv文章url即可一键翻译摘要+下载PDF
|
||||||
[谷歌学术统合小助手](https://www.bilibili.com/video/BV19L411U7ia) | [函数插件] 给定任意谷歌学术搜索页面URL,让gpt帮你[写relatedworks](https://www.bilibili.com/video/BV1GP411U7Az/)
|
[谷歌学术统合小助手](https://www.bilibili.com/video/BV19L411U7ia) | [函数插件] 给定任意谷歌学术搜索页面URL,让gpt帮你[写relatedworks](https://www.bilibili.com/video/BV1GP411U7Az/)
|
||||||
互联网信息聚合+GPT | [函数插件] 一键[让GPT先从互联网获取信息](https://www.bilibili.com/video/BV1om4y127ck),再回答问题,让信息永不过时
|
互联网信息聚合+GPT | [函数插件] 一键[让GPT先从互联网获取信息](https://www.bilibili.com/video/BV1om4y127ck),再回答问题,让信息永不过时
|
||||||
Arxiv论文精密翻译 | [函数插件] 一键[以超高质量翻译arxiv论文](https://www.bilibili.com/video/BV1dz4y1v77A/),迄今为止最好的论文翻译工具
|
⭐Arxiv论文精细翻译 | [函数插件] 一键[以超高质量翻译arxiv论文](https://www.bilibili.com/video/BV1dz4y1v77A/),迄今为止最好的论文翻译工具⭐
|
||||||
公式/图片/表格显示 | 可以同时显示公式的[tex形式和渲染形式](https://user-images.githubusercontent.com/96192199/230598842-1d7fcddd-815d-40ee-af60-baf488a199df.png),支持公式、代码高亮
|
公式/图片/表格显示 | 可以同时显示公式的[tex形式和渲染形式](https://user-images.githubusercontent.com/96192199/230598842-1d7fcddd-815d-40ee-af60-baf488a199df.png),支持公式、代码高亮
|
||||||
多线程函数插件支持 | 支持多线调用chatgpt,一键处理[海量文本](https://www.bilibili.com/video/BV1FT411H7c5/)或程序
|
多线程函数插件支持 | 支持多线调用chatgpt,一键处理[海量文本](https://www.bilibili.com/video/BV1FT411H7c5/)或程序
|
||||||
启动暗色gradio[主题](https://github.com/binary-husky/chatgpt_academic/issues/173) | 在浏览器url后面添加```/?__theme=dark```可以切换dark主题
|
启动暗色gradio[主题](https://github.com/binary-husky/gpt_academic/issues/173) | 在浏览器url后面添加```/?__theme=dark```可以切换dark主题
|
||||||
[多LLM模型](https://www.bilibili.com/video/BV1wT411p7yf)支持,[API2D](https://api2d.com/)接口支持 | 同时被GPT3.5、GPT4、[清华ChatGLM](https://github.com/THUDM/ChatGLM-6B)、[复旦MOSS](https://github.com/OpenLMLab/MOSS)同时伺候的感觉一定会很不错吧?
|
[多LLM模型](https://www.bilibili.com/video/BV1wT411p7yf)支持 | 同时被GPT3.5、GPT4、[清华ChatGLM](https://github.com/THUDM/ChatGLM-6B)、[复旦MOSS](https://github.com/OpenLMLab/MOSS)同时伺候的感觉一定会很不错吧?
|
||||||
更多LLM模型接入,支持[huggingface部署](https://huggingface.co/spaces/qingxu98/gpt-academic) | 加入Newbing接口(新必应),引入清华[Jittorllms](https://github.com/Jittor/JittorLLMs)支持[LLaMA](https://github.com/facebookresearch/llama),[RWKV](https://github.com/BlinkDL/ChatRWKV)和[盘古α](https://openi.org.cn/pangu/)
|
更多LLM模型接入,支持[huggingface部署](https://huggingface.co/spaces/qingxu98/gpt-academic) | 加入Newbing接口(新必应),引入清华[Jittorllms](https://github.com/Jittor/JittorLLMs)支持[LLaMA](https://github.com/facebookresearch/llama),[RWKV](https://github.com/BlinkDL/ChatRWKV)和[盘古α](https://openi.org.cn/pangu/)
|
||||||
更多新功能展示(图像生成等) …… | 见本文档结尾处 ……
|
更多新功能展示(图像生成等) …… | 见本文档结尾处 ……
|
||||||
|
|
||||||
@@ -91,13 +91,13 @@ Arxiv论文精密翻译 | [函数插件] 一键[以超高质量翻译arxiv论文
|
|||||||
|
|
||||||
1. 下载项目
|
1. 下载项目
|
||||||
```sh
|
```sh
|
||||||
git clone https://github.com/binary-husky/chatgpt_academic.git
|
git clone https://github.com/binary-husky/gpt_academic.git
|
||||||
cd chatgpt_academic
|
cd gpt_academic
|
||||||
```
|
```
|
||||||
|
|
||||||
2. 配置API_KEY
|
2. 配置API_KEY
|
||||||
|
|
||||||
在`config.py`中,配置API KEY等设置,[特殊网络环境设置](https://github.com/binary-husky/gpt_academic/issues/1) 。
|
在`config.py`中,配置API KEY等设置,[点击查看特殊网络环境设置方法](https://github.com/binary-husky/gpt_academic/issues/1) 。
|
||||||
|
|
||||||
(P.S. 程序运行时会优先检查是否存在名为`config_private.py`的私密配置文件,并用其中的配置覆盖`config.py`的同名配置。因此,如果您能理解我们的配置读取逻辑,我们强烈建议您在`config.py`旁边创建一个名为`config_private.py`的新配置文件,并把`config.py`中的配置转移(复制)到`config_private.py`中。`config_private.py`不受git管控,可以让您的隐私信息更加安全。P.S.项目同样支持通过`环境变量`配置大多数选项,环境变量的书写格式参考`docker-compose`文件。读取优先级: `环境变量` > `config_private.py` > `config.py`)
|
(P.S. 程序运行时会优先检查是否存在名为`config_private.py`的私密配置文件,并用其中的配置覆盖`config.py`的同名配置。因此,如果您能理解我们的配置读取逻辑,我们强烈建议您在`config.py`旁边创建一个名为`config_private.py`的新配置文件,并把`config.py`中的配置转移(复制)到`config_private.py`中。`config_private.py`不受git管控,可以让您的隐私信息更加安全。P.S.项目同样支持通过`环境变量`配置大多数选项,环境变量的书写格式参考`docker-compose`文件。读取优先级: `环境变量` > `config_private.py` > `config.py`)
|
||||||
|
|
||||||
@@ -113,6 +113,7 @@ conda activate gptac_venv # 激活anaconda环境
|
|||||||
python -m pip install -r requirements.txt # 这个步骤和pip安装一样的步骤
|
python -m pip install -r requirements.txt # 这个步骤和pip安装一样的步骤
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
<details><summary>如果需要支持清华ChatGLM/复旦MOSS作为后端,请点击展开此处</summary>
|
<details><summary>如果需要支持清华ChatGLM/复旦MOSS作为后端,请点击展开此处</summary>
|
||||||
<p>
|
<p>
|
||||||
|
|
||||||
@@ -139,19 +140,13 @@ AVAIL_LLM_MODELS = ["gpt-3.5-turbo", "api2d-gpt-3.5-turbo", "gpt-4", "api2d-gpt-
|
|||||||
python main.py
|
python main.py
|
||||||
```
|
```
|
||||||
|
|
||||||
5. 测试函数插件
|
|
||||||
```
|
|
||||||
- 测试函数插件模板函数(要求gpt回答历史上的今天发生了什么),您可以根据此函数为模板,实现更复杂的功能
|
|
||||||
点击 "[函数插件模板Demo] 历史上的今天"
|
|
||||||
```
|
|
||||||
|
|
||||||
## 安装-方法2:使用Docker
|
## 安装-方法2:使用Docker
|
||||||
|
|
||||||
1. 仅ChatGPT(推荐大多数人选择)
|
1. 仅ChatGPT(推荐大多数人选择,等价于docker-compose方案1)
|
||||||
|
|
||||||
``` sh
|
``` sh
|
||||||
git clone https://github.com/binary-husky/chatgpt_academic.git # 下载项目
|
git clone https://github.com/binary-husky/gpt_academic.git # 下载项目
|
||||||
cd chatgpt_academic # 进入路径
|
cd gpt_academic # 进入路径
|
||||||
nano config.py # 用任意文本编辑器编辑config.py, 配置 “Proxy”, “API_KEY” 以及 “WEB_PORT” (例如50923) 等
|
nano config.py # 用任意文本编辑器编辑config.py, 配置 “Proxy”, “API_KEY” 以及 “WEB_PORT” (例如50923) 等
|
||||||
docker build -t gpt-academic . # 安装
|
docker build -t gpt-academic . # 安装
|
||||||
|
|
||||||
@@ -160,40 +155,43 @@ docker run --rm -it --net=host gpt-academic
|
|||||||
#(最后一步-选择2)在macOS/windows环境下,只能用-p选项将容器上的端口(例如50923)暴露给主机上的端口
|
#(最后一步-选择2)在macOS/windows环境下,只能用-p选项将容器上的端口(例如50923)暴露给主机上的端口
|
||||||
docker run --rm -it -e WEB_PORT=50923 -p 50923:50923 gpt-academic
|
docker run --rm -it -e WEB_PORT=50923 -p 50923:50923 gpt-academic
|
||||||
```
|
```
|
||||||
|
P.S. 如果需要依赖Latex的插件功能,请见Wiki。另外,您也可以直接使用docker-compose获取Latex功能(修改docker-compose.yml,保留方案4并删除其他方案)。
|
||||||
|
|
||||||
2. ChatGPT + ChatGLM + MOSS(需要熟悉Docker)
|
2. ChatGPT + ChatGLM + MOSS(需要熟悉Docker)
|
||||||
|
|
||||||
``` sh
|
``` sh
|
||||||
# 修改docker-compose.yml,删除方案1和方案3,保留方案2。修改docker-compose.yml中方案2的配置,参考其中注释即可
|
# 修改docker-compose.yml,保留方案2并删除其他方案。修改docker-compose.yml中方案2的配置,参考其中注释即可
|
||||||
docker-compose up
|
docker-compose up
|
||||||
```
|
```
|
||||||
|
|
||||||
3. ChatGPT + LLAMA + 盘古 + RWKV(需要熟悉Docker)
|
3. ChatGPT + LLAMA + 盘古 + RWKV(需要熟悉Docker)
|
||||||
``` sh
|
``` sh
|
||||||
# 修改docker-compose.yml,删除方案1和方案2,保留方案3。修改docker-compose.yml中方案3的配置,参考其中注释即可
|
# 修改docker-compose.yml,保留方案3并删除其他方案。修改docker-compose.yml中方案3的配置,参考其中注释即可
|
||||||
docker-compose up
|
docker-compose up
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
## 安装-方法3:其他部署姿势
|
## 安装-方法3:其他部署姿势
|
||||||
1. 一键运行脚本。
|
1. 一键运行脚本。
|
||||||
完全不熟悉python环境的Windows用户可以下载[Release](https://github.com/binary-husky/gpt_academic/releases)中发布的一键运行脚本安装无本地模型的版本,
|
完全不熟悉python环境的Windows用户可以下载[Release](https://github.com/binary-husky/gpt_academic/releases)中发布的一键运行脚本安装无本地模型的版本。
|
||||||
不建议电脑上已有python的用户采用此方法(在此基础上安装插件的依赖很麻烦)。
|
|
||||||
脚本的贡献来源是[oobabooga](https://github.com/oobabooga/one-click-installers)。
|
脚本的贡献来源是[oobabooga](https://github.com/oobabooga/one-click-installers)。
|
||||||
|
|
||||||
2. 使用docker-compose运行。
|
2. 使用docker-compose运行。
|
||||||
请阅读docker-compose.yml后,按照其中的提示操作即可
|
请阅读docker-compose.yml后,按照其中的提示操作即可
|
||||||
|
|
||||||
3. 如何使用反代URL/微软云AzureAPI。
|
3. 如何使用反代URL
|
||||||
按照`config.py`中的说明配置API_URL_REDIRECT即可。
|
按照`config.py`中的说明配置API_URL_REDIRECT即可。
|
||||||
|
|
||||||
4. 远程云服务器部署(需要云服务器知识与经验)。
|
4. 微软云AzureAPI
|
||||||
请访问[部署wiki-1](https://github.com/binary-husky/chatgpt_academic/wiki/%E4%BA%91%E6%9C%8D%E5%8A%A1%E5%99%A8%E8%BF%9C%E7%A8%8B%E9%83%A8%E7%BD%B2%E6%8C%87%E5%8D%97)
|
按照`config.py`中的说明配置即可(AZURE_ENDPOINT等四个配置)
|
||||||
|
|
||||||
5. 使用WSL2(Windows Subsystem for Linux 子系统)。
|
5. 远程云服务器部署(需要云服务器知识与经验)。
|
||||||
请访问[部署wiki-2](https://github.com/binary-husky/chatgpt_academic/wiki/%E4%BD%BF%E7%94%A8WSL2%EF%BC%88Windows-Subsystem-for-Linux-%E5%AD%90%E7%B3%BB%E7%BB%9F%EF%BC%89%E9%83%A8%E7%BD%B2)
|
请访问[部署wiki-1](https://github.com/binary-husky/gpt_academic/wiki/%E4%BA%91%E6%9C%8D%E5%8A%A1%E5%99%A8%E8%BF%9C%E7%A8%8B%E9%83%A8%E7%BD%B2%E6%8C%87%E5%8D%97)
|
||||||
|
|
||||||
6. 如何在二级网址(如`http://localhost/subpath`)下运行。
|
6. 使用WSL2(Windows Subsystem for Linux 子系统)。
|
||||||
|
请访问[部署wiki-2](https://github.com/binary-husky/gpt_academic/wiki/%E4%BD%BF%E7%94%A8WSL2%EF%BC%88Windows-Subsystem-for-Linux-%E5%AD%90%E7%B3%BB%E7%BB%9F%EF%BC%89%E9%83%A8%E7%BD%B2)
|
||||||
|
|
||||||
|
7. 如何在二级网址(如`http://localhost/subpath`)下运行。
|
||||||
请访问[FastAPI运行说明](docs/WithFastapi.md)
|
请访问[FastAPI运行说明](docs/WithFastapi.md)
|
||||||
|
|
||||||
---
|
---
|
||||||
@@ -220,7 +218,7 @@ docker-compose up
|
|||||||
|
|
||||||
编写强大的函数插件来执行任何你想得到的和想不到的任务。
|
编写强大的函数插件来执行任何你想得到的和想不到的任务。
|
||||||
本项目的插件编写、调试难度很低,只要您具备一定的python基础知识,就可以仿照我们提供的模板实现自己的插件功能。
|
本项目的插件编写、调试难度很低,只要您具备一定的python基础知识,就可以仿照我们提供的模板实现自己的插件功能。
|
||||||
详情请参考[函数插件指南](https://github.com/binary-husky/chatgpt_academic/wiki/%E5%87%BD%E6%95%B0%E6%8F%92%E4%BB%B6%E6%8C%87%E5%8D%97)。
|
详情请参考[函数插件指南](https://github.com/binary-husky/gpt_academic/wiki/%E5%87%BD%E6%95%B0%E6%8F%92%E4%BB%B6%E6%8C%87%E5%8D%97)。
|
||||||
|
|
||||||
---
|
---
|
||||||
# Latest Update
|
# Latest Update
|
||||||
@@ -228,38 +226,33 @@ docker-compose up
|
|||||||
|
|
||||||
1. 对话保存功能。在函数插件区调用 `保存当前的对话` 即可将当前对话保存为可读+可复原的html文件,
|
1. 对话保存功能。在函数插件区调用 `保存当前的对话` 即可将当前对话保存为可读+可复原的html文件,
|
||||||
另外在函数插件区(下拉菜单)调用 `载入对话历史存档` ,即可还原之前的会话。
|
另外在函数插件区(下拉菜单)调用 `载入对话历史存档` ,即可还原之前的会话。
|
||||||
Tip:不指定文件直接点击 `载入对话历史存档` 可以查看历史html存档缓存,点击 `删除所有本地对话历史记录` 可以删除所有html存档缓存。
|
Tip:不指定文件直接点击 `载入对话历史存档` 可以查看历史html存档缓存。
|
||||||
<div align="center">
|
<div align="center">
|
||||||
<img src="https://user-images.githubusercontent.com/96192199/235222390-24a9acc0-680f-49f5-bc81-2f3161f1e049.png" width="500" >
|
<img src="https://user-images.githubusercontent.com/96192199/235222390-24a9acc0-680f-49f5-bc81-2f3161f1e049.png" width="500" >
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
2. ⭐Latex/Arxiv论文翻译功能⭐
|
||||||
|
|
||||||
2. 生成报告。大部分插件都会在执行结束后,生成工作报告
|
|
||||||
<div align="center">
|
<div align="center">
|
||||||
<img src="https://user-images.githubusercontent.com/96192199/227503770-fe29ce2c-53fd-47b0-b0ff-93805f0c2ff4.png" height="300" >
|
<img src="https://github.com/binary-husky/gpt_academic/assets/96192199/002a1a75-ace0-4e6a-94e2-ec1406a746f1" height="250" > ===>
|
||||||
<img src="https://user-images.githubusercontent.com/96192199/227504617-7a497bb3-0a2a-4b50-9a8a-95ae60ea7afd.png" height="300" >
|
<img src="https://github.com/binary-husky/gpt_academic/assets/96192199/9fdcc391-f823-464f-9322-f8719677043b" height="250" >
|
||||||
<img src="https://user-images.githubusercontent.com/96192199/227504005-efeaefe0-b687-49d0-bf95-2d7b7e66c348.png" height="300" >
|
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
3. 模块化功能设计,简单的接口却能支持强大的功能
|
3. 生成报告。大部分插件都会在执行结束后,生成工作报告
|
||||||
|
<div align="center">
|
||||||
|
<img src="https://user-images.githubusercontent.com/96192199/227503770-fe29ce2c-53fd-47b0-b0ff-93805f0c2ff4.png" height="250" >
|
||||||
|
<img src="https://user-images.githubusercontent.com/96192199/227504617-7a497bb3-0a2a-4b50-9a8a-95ae60ea7afd.png" height="250" >
|
||||||
|
</div>
|
||||||
|
|
||||||
|
4. 模块化功能设计,简单的接口却能支持强大的功能
|
||||||
<div align="center">
|
<div align="center">
|
||||||
<img src="https://user-images.githubusercontent.com/96192199/229288270-093643c1-0018-487a-81e6-1d7809b6e90f.png" height="400" >
|
<img src="https://user-images.githubusercontent.com/96192199/229288270-093643c1-0018-487a-81e6-1d7809b6e90f.png" height="400" >
|
||||||
<img src="https://user-images.githubusercontent.com/96192199/227504931-19955f78-45cd-4d1c-adac-e71e50957915.png" height="400" >
|
<img src="https://user-images.githubusercontent.com/96192199/227504931-19955f78-45cd-4d1c-adac-e71e50957915.png" height="400" >
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
4. 这是一个能够“自我译解”的开源项目
|
5. 译解其他开源项目
|
||||||
<div align="center">
|
<div align="center">
|
||||||
<img src="https://user-images.githubusercontent.com/96192199/226936850-c77d7183-0749-4c1c-9875-fd4891842d0c.png" width="500" >
|
<img src="https://user-images.githubusercontent.com/96192199/226935232-6b6a73ce-8900-4aee-93f9-733c7e6fef53.png" height="250" >
|
||||||
</div>
|
<img src="https://user-images.githubusercontent.com/96192199/226969067-968a27c1-1b9c-486b-8b81-ab2de8d3f88a.png" height="250" >
|
||||||
|
|
||||||
5. 译解其他开源项目,不在话下
|
|
||||||
<div align="center">
|
|
||||||
<img src="https://user-images.githubusercontent.com/96192199/226935232-6b6a73ce-8900-4aee-93f9-733c7e6fef53.png" width="500" >
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<div align="center">
|
|
||||||
<img src="https://user-images.githubusercontent.com/96192199/226969067-968a27c1-1b9c-486b-8b81-ab2de8d3f88a.png" width="500" >
|
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
6. 装饰[live2d](https://github.com/fghrsh/live2d_demo)的小功能(默认关闭,需要修改`config.py`)
|
6. 装饰[live2d](https://github.com/fghrsh/live2d_demo)的小功能(默认关闭,需要修改`config.py`)
|
||||||
@@ -284,15 +277,11 @@ Tip:不指定文件直接点击 `载入对话历史存档` 可以查看历史h
|
|||||||
|
|
||||||
10. Latex全文校对纠错
|
10. Latex全文校对纠错
|
||||||
<div align="center">
|
<div align="center">
|
||||||
<img src="https://github.com/binary-husky/gpt_academic/assets/96192199/651ccd98-02c9-4464-91e1-77a6b7d1b033" height="250" > ===>
|
<img src="https://github.com/binary-husky/gpt_academic/assets/96192199/651ccd98-02c9-4464-91e1-77a6b7d1b033" height="200" > ===>
|
||||||
<img src="https://github.com/binary-husky/gpt_academic/assets/96192199/476f66d9-7716-4537-b5c1-735372c25adb" height="250">
|
<img src="https://github.com/binary-husky/gpt_academic/assets/96192199/476f66d9-7716-4537-b5c1-735372c25adb" height="200">
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
10. Latex/Arxiv论文翻译功能
|
|
||||||
<div align="center">
|
|
||||||
<img src="https://github.com/binary-husky/gpt_academic/assets/96192199/002a1a75-ace0-4e6a-94e2-ec1406a746f1" height="250" >
|
|
||||||
<img src="https://github.com/binary-husky/gpt_academic/assets/96192199/9fdcc391-f823-464f-9322-f8719677043b" height="250" >
|
|
||||||
</div>
|
|
||||||
|
|
||||||
## 版本:
|
## 版本:
|
||||||
- version 3.5(Todo): 使用自然语言调用本项目的所有函数插件(高优先级)
|
- version 3.5(Todo): 使用自然语言调用本项目的所有函数插件(高优先级)
|
||||||
@@ -314,30 +303,32 @@ gpt_academic开发者QQ群-2:610599535
|
|||||||
|
|
||||||
- 已知问题
|
- 已知问题
|
||||||
- 某些浏览器翻译插件干扰此软件前端的运行
|
- 某些浏览器翻译插件干扰此软件前端的运行
|
||||||
- 官方Gradio目前有很多兼容性Bug,请务必使用requirement.txt安装Gradio
|
- 官方Gradio目前有很多兼容性Bug,请务必使用`requirement.txt`安装Gradio
|
||||||
|
|
||||||
## 参考与学习
|
## 参考与学习
|
||||||
|
|
||||||
```
|
```
|
||||||
代码中参考了很多其他优秀项目中的设计,主要包括:
|
代码中参考了很多其他优秀项目中的设计,顺序不分先后:
|
||||||
|
|
||||||
# 项目1:清华ChatGLM-6B:
|
# 清华ChatGLM-6B:
|
||||||
https://github.com/THUDM/ChatGLM-6B
|
https://github.com/THUDM/ChatGLM-6B
|
||||||
|
|
||||||
# 项目2:清华JittorLLMs:
|
# 清华JittorLLMs:
|
||||||
https://github.com/Jittor/JittorLLMs
|
https://github.com/Jittor/JittorLLMs
|
||||||
|
|
||||||
# 项目3:Edge-GPT:
|
# ChatPaper:
|
||||||
https://github.com/acheong08/EdgeGPT
|
|
||||||
|
|
||||||
# 项目4:ChuanhuChatGPT:
|
|
||||||
https://github.com/GaiZhenbiao/ChuanhuChatGPT
|
|
||||||
|
|
||||||
# 项目5:ChatPaper:
|
|
||||||
https://github.com/kaixindelele/ChatPaper
|
https://github.com/kaixindelele/ChatPaper
|
||||||
|
|
||||||
# 更多:
|
# Edge-GPT:
|
||||||
|
https://github.com/acheong08/EdgeGPT
|
||||||
|
|
||||||
|
# ChuanhuChatGPT:
|
||||||
|
https://github.com/GaiZhenbiao/ChuanhuChatGPT
|
||||||
|
|
||||||
|
# Oobabooga one-click installer:
|
||||||
|
https://github.com/oobabooga/one-click-installers
|
||||||
|
|
||||||
|
# More:
|
||||||
https://github.com/gradio-app/gradio
|
https://github.com/gradio-app/gradio
|
||||||
https://github.com/fghrsh/live2d_demo
|
https://github.com/fghrsh/live2d_demo
|
||||||
https://github.com/oobabooga/one-click-installers
|
|
||||||
```
|
```
|
||||||
|
|||||||
12
config.py
12
config.py
@@ -1,6 +1,7 @@
|
|||||||
# [step 1]>> 例如: API_KEY = "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" (此key无效)
|
# [step 1]>> 例如: API_KEY = "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" (此key无效)
|
||||||
API_KEY = "sk-此处填API密钥" # 可同时填写多个API-KEY,用英文逗号分割,例如API_KEY = "sk-openaikey1,sk-openaikey2,fkxxxx-api2dkey1,fkxxxx-api2dkey2"
|
API_KEY = "sk-此处填API密钥" # 可同时填写多个API-KEY,用英文逗号分割,例如API_KEY = "sk-openaikey1,sk-openaikey2,fkxxxx-api2dkey1,fkxxxx-api2dkey2"
|
||||||
|
|
||||||
|
|
||||||
# [step 2]>> 改为True应用代理,如果直接在海外服务器部署,此处不修改
|
# [step 2]>> 改为True应用代理,如果直接在海外服务器部署,此处不修改
|
||||||
USE_PROXY = False
|
USE_PROXY = False
|
||||||
if USE_PROXY:
|
if USE_PROXY:
|
||||||
@@ -46,8 +47,8 @@ MAX_RETRY = 2
|
|||||||
|
|
||||||
# 模型选择是 (注意: LLM_MODEL是默认选中的模型, 同时它必须被包含在AVAIL_LLM_MODELS切换列表中 )
|
# 模型选择是 (注意: LLM_MODEL是默认选中的模型, 同时它必须被包含在AVAIL_LLM_MODELS切换列表中 )
|
||||||
LLM_MODEL = "gpt-3.5-turbo" # 可选 ↓↓↓
|
LLM_MODEL = "gpt-3.5-turbo" # 可选 ↓↓↓
|
||||||
AVAIL_LLM_MODELS = ["gpt-3.5-turbo", "api2d-gpt-3.5-turbo", "gpt-4", "api2d-gpt-4", "chatglm", "moss", "newbing", "newbing-free", "stack-claude"]
|
AVAIL_LLM_MODELS = ["gpt-3.5-turbo-16k", "gpt-3.5-turbo", "azure-gpt35", "api2d-gpt-3.5-turbo", "gpt-4", "api2d-gpt-4", "chatglm", "moss", "newbing", "newbing-free", "stack-claude"]
|
||||||
# P.S. 其他可用的模型还包括 ["newbing-free", "jittorllms_rwkv", "jittorllms_pangualpha", "jittorllms_llama"]
|
# P.S. 其他可用的模型还包括 ["gpt-3.5-turbo-0613", "gpt-3.5-turbo-16k-0613", "newbing-free", "jittorllms_rwkv", "jittorllms_pangualpha", "jittorllms_llama"]
|
||||||
|
|
||||||
# 本地LLM模型如ChatGLM的执行方式 CPU/GPU
|
# 本地LLM模型如ChatGLM的执行方式 CPU/GPU
|
||||||
LOCAL_MODEL_DEVICE = "cpu" # 可选 "cuda"
|
LOCAL_MODEL_DEVICE = "cpu" # 可选 "cuda"
|
||||||
@@ -81,3 +82,10 @@ your bing cookies here
|
|||||||
# 如果需要使用Slack Claude,使用教程详情见 request_llm/README.md
|
# 如果需要使用Slack Claude,使用教程详情见 request_llm/README.md
|
||||||
SLACK_CLAUDE_BOT_ID = ''
|
SLACK_CLAUDE_BOT_ID = ''
|
||||||
SLACK_CLAUDE_USER_TOKEN = ''
|
SLACK_CLAUDE_USER_TOKEN = ''
|
||||||
|
|
||||||
|
|
||||||
|
# 如果需要使用AZURE 详情请见额外文档 docs\use_azure.md
|
||||||
|
AZURE_ENDPOINT = "https://你的api名称.openai.azure.com/"
|
||||||
|
AZURE_API_KEY = "填入azure openai api的密钥"
|
||||||
|
AZURE_API_VERSION = "填入api版本"
|
||||||
|
AZURE_ENGINE = "填入ENGINE"
|
||||||
|
|||||||
@@ -112,11 +112,11 @@ def get_crazy_functions():
|
|||||||
"AsButton": False, # 加入下拉菜单中
|
"AsButton": False, # 加入下拉菜单中
|
||||||
"Function": HotReload(解析项目本身)
|
"Function": HotReload(解析项目本身)
|
||||||
},
|
},
|
||||||
"[老旧的Demo] 把本项目源代码切换成全英文": {
|
# "[老旧的Demo] 把本项目源代码切换成全英文": {
|
||||||
# HotReload 的意思是热更新,修改函数插件代码后,不需要重启程序,代码直接生效
|
# # HotReload 的意思是热更新,修改函数插件代码后,不需要重启程序,代码直接生效
|
||||||
"AsButton": False, # 加入下拉菜单中
|
# "AsButton": False, # 加入下拉菜单中
|
||||||
"Function": HotReload(全项目切换英文)
|
# "Function": HotReload(全项目切换英文)
|
||||||
},
|
# },
|
||||||
"[插件demo] 历史上的今天": {
|
"[插件demo] 历史上的今天": {
|
||||||
# HotReload 的意思是热更新,修改函数插件代码后,不需要重启程序,代码直接生效
|
# HotReload 的意思是热更新,修改函数插件代码后,不需要重启程序,代码直接生效
|
||||||
"Function": HotReload(高阶功能模板函数)
|
"Function": HotReload(高阶功能模板函数)
|
||||||
@@ -348,25 +348,52 @@ def get_crazy_functions():
|
|||||||
try:
|
try:
|
||||||
from crazy_functions.Latex输出PDF结果 import Latex英文纠错加PDF对比
|
from crazy_functions.Latex输出PDF结果 import Latex英文纠错加PDF对比
|
||||||
function_plugins.update({
|
function_plugins.update({
|
||||||
"[功能尚不稳定] Latex英文纠错+LatexDiff高亮修正位置": {
|
"Latex英文纠错+高亮修正位置 [需Latex]": {
|
||||||
"Color": "stop",
|
"Color": "stop",
|
||||||
"AsButton": False,
|
"AsButton": False,
|
||||||
# "AdvancedArgs": True,
|
"AdvancedArgs": True,
|
||||||
# "ArgsReminder": "",
|
"ArgsReminder": "如果有必要, 请在此处追加更细致的矫错指令(使用英文)。",
|
||||||
"Function": HotReload(Latex英文纠错加PDF对比)
|
"Function": HotReload(Latex英文纠错加PDF对比)
|
||||||
}
|
}
|
||||||
})
|
})
|
||||||
from crazy_functions.Latex输出PDF结果 import Latex翻译中文并重新编译PDF
|
from crazy_functions.Latex输出PDF结果 import Latex翻译中文并重新编译PDF
|
||||||
function_plugins.update({
|
function_plugins.update({
|
||||||
"[功能尚不稳定] Latex翻译/Arixv翻译+重构PDF": {
|
"Arixv翻译(输入arxivID)[需Latex]": {
|
||||||
"Color": "stop",
|
"Color": "stop",
|
||||||
"AsButton": False,
|
"AsButton": False,
|
||||||
# "AdvancedArgs": True,
|
"AdvancedArgs": True,
|
||||||
# "ArgsReminder": "",
|
"ArgsReminder":
|
||||||
|
"如果有必要, 请在此处给出自定义翻译命令, 解决部分词汇翻译不准确的问题。 "+
|
||||||
|
"例如当单词'agent'翻译不准确时, 请尝试把以下指令复制到高级参数区: " + 'If the term "agent" is used in this section, it should be translated to "智能体". ',
|
||||||
|
"Function": HotReload(Latex翻译中文并重新编译PDF)
|
||||||
|
}
|
||||||
|
})
|
||||||
|
function_plugins.update({
|
||||||
|
"本地论文翻译(上传Latex压缩包)[需Latex]": {
|
||||||
|
"Color": "stop",
|
||||||
|
"AsButton": False,
|
||||||
|
"AdvancedArgs": True,
|
||||||
|
"ArgsReminder":
|
||||||
|
"如果有必要, 请在此处给出自定义翻译命令, 解决部分词汇翻译不准确的问题。 "+
|
||||||
|
"例如当单词'agent'翻译不准确时, 请尝试把以下指令复制到高级参数区: " + 'If the term "agent" is used in this section, it should be translated to "智能体". ',
|
||||||
"Function": HotReload(Latex翻译中文并重新编译PDF)
|
"Function": HotReload(Latex翻译中文并重新编译PDF)
|
||||||
}
|
}
|
||||||
})
|
})
|
||||||
except:
|
except:
|
||||||
print('Load function plugin failed')
|
print('Load function plugin failed')
|
||||||
###################### 第n组插件 ###########################
|
|
||||||
|
# try:
|
||||||
|
# from crazy_functions.虚空终端 import 终端
|
||||||
|
# function_plugins.update({
|
||||||
|
# "超级终端": {
|
||||||
|
# "Color": "stop",
|
||||||
|
# "AsButton": False,
|
||||||
|
# # "AdvancedArgs": True,
|
||||||
|
# # "ArgsReminder": "",
|
||||||
|
# "Function": HotReload(终端)
|
||||||
|
# }
|
||||||
|
# })
|
||||||
|
# except:
|
||||||
|
# print('Load function plugin failed')
|
||||||
|
|
||||||
return function_plugins
|
return function_plugins
|
||||||
|
|||||||
@@ -30,7 +30,7 @@ def 知识库问答(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_pro
|
|||||||
)
|
)
|
||||||
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
||||||
from .crazy_utils import try_install_deps
|
from .crazy_utils import try_install_deps
|
||||||
try_install_deps(['zh_langchain==0.2.0'])
|
try_install_deps(['zh_langchain==0.2.1'])
|
||||||
|
|
||||||
# < --------------------读取参数--------------- >
|
# < --------------------读取参数--------------- >
|
||||||
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
|
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
|
||||||
@@ -84,7 +84,7 @@ def 读取知识库作答(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst
|
|||||||
chatbot.append(["依赖不足", "导入依赖失败。正在尝试自动安装,请查看终端的输出或耐心等待..."])
|
chatbot.append(["依赖不足", "导入依赖失败。正在尝试自动安装,请查看终端的输出或耐心等待..."])
|
||||||
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
||||||
from .crazy_utils import try_install_deps
|
from .crazy_utils import try_install_deps
|
||||||
try_install_deps(['zh_langchain==0.2.0'])
|
try_install_deps(['zh_langchain==0.2.1'])
|
||||||
|
|
||||||
# < ------------------- --------------- >
|
# < ------------------- --------------- >
|
||||||
kai = knowledge_archive_interface()
|
kai = knowledge_archive_interface()
|
||||||
|
|||||||
@@ -1,12 +1,13 @@
|
|||||||
from toolbox import update_ui, trimmed_format_exc, get_conf, objdump, objload, promote_file_to_downloadzone
|
from toolbox import update_ui, trimmed_format_exc, get_conf, objdump, objload, promote_file_to_downloadzone
|
||||||
from toolbox import CatchException, report_execption, update_ui_lastest_msg, zip_result, gen_time_str
|
from toolbox import CatchException, report_execption, update_ui_lastest_msg, zip_result, gen_time_str
|
||||||
|
from functools import partial
|
||||||
import glob, os, requests, time
|
import glob, os, requests, time
|
||||||
pj = os.path.join
|
pj = os.path.join
|
||||||
ARXIV_CACHE_DIR = os.path.expanduser(f"~/arxiv_cache/")
|
ARXIV_CACHE_DIR = os.path.expanduser(f"~/arxiv_cache/")
|
||||||
|
|
||||||
# =================================== 工具函数 ===============================================
|
# =================================== 工具函数 ===============================================
|
||||||
沙雕GPT啊别犯这些低级翻译错误 = 'You must to translate "agent" to "智能体". '
|
专业词汇声明 = 'If the term "agent" is used in this section, it should be translated to "智能体". '
|
||||||
def switch_prompt(pfg, mode):
|
def switch_prompt(pfg, mode, more_requirement):
|
||||||
"""
|
"""
|
||||||
Generate prompts and system prompts based on the mode for proofreading or translating.
|
Generate prompts and system prompts based on the mode for proofreading or translating.
|
||||||
Args:
|
Args:
|
||||||
@@ -18,14 +19,14 @@ def switch_prompt(pfg, mode):
|
|||||||
- sys_prompt_array: A list of strings containing prompts for system prompts.
|
- sys_prompt_array: A list of strings containing prompts for system prompts.
|
||||||
"""
|
"""
|
||||||
n_split = len(pfg.sp_file_contents)
|
n_split = len(pfg.sp_file_contents)
|
||||||
if mode == 'proofread':
|
if mode == 'proofread_en':
|
||||||
inputs_array = [r"Below is a section from an academic paper, proofread this section." +
|
inputs_array = [r"Below is a section from an academic paper, proofread this section." +
|
||||||
r"Do not modify any latex command such as \section, \cite, \begin, \item and equations. " +
|
r"Do not modify any latex command such as \section, \cite, \begin, \item and equations. " + more_requirement +
|
||||||
r"Answer me only with the revised text:" +
|
r"Answer me only with the revised text:" +
|
||||||
f"\n\n{frag}" for frag in pfg.sp_file_contents]
|
f"\n\n{frag}" for frag in pfg.sp_file_contents]
|
||||||
sys_prompt_array = ["You are a professional academic paper writer." for _ in range(n_split)]
|
sys_prompt_array = ["You are a professional academic paper writer." for _ in range(n_split)]
|
||||||
elif mode == 'translate_zh':
|
elif mode == 'translate_zh':
|
||||||
inputs_array = [r"Below is a section from an English academic paper, translate it into Chinese." + 沙雕GPT啊别犯这些低级翻译错误 +
|
inputs_array = [r"Below is a section from an English academic paper, translate it into Chinese. " + more_requirement +
|
||||||
r"Do not modify any latex command such as \section, \cite, \begin, \item and equations. " +
|
r"Do not modify any latex command such as \section, \cite, \begin, \item and equations. " +
|
||||||
r"Answer me only with the translated text:" +
|
r"Answer me only with the translated text:" +
|
||||||
f"\n\n{frag}" for frag in pfg.sp_file_contents]
|
f"\n\n{frag}" for frag in pfg.sp_file_contents]
|
||||||
@@ -69,6 +70,12 @@ def move_project(project_folder, arxiv_id=None):
|
|||||||
shutil.rmtree(new_workfolder)
|
shutil.rmtree(new_workfolder)
|
||||||
except:
|
except:
|
||||||
pass
|
pass
|
||||||
|
|
||||||
|
# align subfolder if there is a folder wrapper
|
||||||
|
items = glob.glob(pj(project_folder,'*'))
|
||||||
|
if len(glob.glob(pj(project_folder,'*.tex'))) == 0 and len(items) == 1:
|
||||||
|
if os.path.isdir(items[0]): project_folder = items[0]
|
||||||
|
|
||||||
shutil.copytree(src=project_folder, dst=new_workfolder)
|
shutil.copytree(src=project_folder, dst=new_workfolder)
|
||||||
return new_workfolder
|
return new_workfolder
|
||||||
|
|
||||||
@@ -79,7 +86,7 @@ def arxiv_download(chatbot, history, txt):
|
|||||||
os.makedirs(translation_dir)
|
os.makedirs(translation_dir)
|
||||||
target_file = pj(translation_dir, 'translate_zh.pdf')
|
target_file = pj(translation_dir, 'translate_zh.pdf')
|
||||||
if os.path.exists(target_file):
|
if os.path.exists(target_file):
|
||||||
promote_file_to_downloadzone(target_file)
|
promote_file_to_downloadzone(target_file, rename_file=None, chatbot=chatbot)
|
||||||
return target_file
|
return target_file
|
||||||
return False
|
return False
|
||||||
def is_float(s):
|
def is_float(s):
|
||||||
@@ -88,8 +95,10 @@ def arxiv_download(chatbot, history, txt):
|
|||||||
return True
|
return True
|
||||||
except ValueError:
|
except ValueError:
|
||||||
return False
|
return False
|
||||||
if ('.' in txt) and ('/' not in txt) and is_float(txt):
|
if ('.' in txt) and ('/' not in txt) and is_float(txt): # is arxiv ID
|
||||||
txt = 'https://arxiv.org/abs/' + txt
|
txt = 'https://arxiv.org/abs/' + txt.strip()
|
||||||
|
if ('.' in txt) and ('/' not in txt) and is_float(txt[:10]): # is arxiv ID
|
||||||
|
txt = 'https://arxiv.org/abs/' + txt[:10]
|
||||||
if not txt.startswith('https://arxiv.org'):
|
if not txt.startswith('https://arxiv.org'):
|
||||||
return txt, None
|
return txt, None
|
||||||
|
|
||||||
@@ -105,6 +114,7 @@ def arxiv_download(chatbot, history, txt):
|
|||||||
return msg, None
|
return msg, None
|
||||||
# <-------------- set format ------------->
|
# <-------------- set format ------------->
|
||||||
arxiv_id = url_.split('/abs/')[-1]
|
arxiv_id = url_.split('/abs/')[-1]
|
||||||
|
if 'v' in arxiv_id: arxiv_id = arxiv_id[:10]
|
||||||
cached_translation_pdf = check_cached_translation_pdf(arxiv_id)
|
cached_translation_pdf = check_cached_translation_pdf(arxiv_id)
|
||||||
if cached_translation_pdf: return cached_translation_pdf, arxiv_id
|
if cached_translation_pdf: return cached_translation_pdf, arxiv_id
|
||||||
|
|
||||||
@@ -137,7 +147,11 @@ def Latex英文纠错加PDF对比(txt, llm_kwargs, plugin_kwargs, chatbot, histo
|
|||||||
chatbot.append([ "函数插件功能?",
|
chatbot.append([ "函数插件功能?",
|
||||||
"对整个Latex项目进行纠错, 用latex编译为PDF对修正处做高亮。函数插件贡献者: Binary-Husky。注意事项: 目前仅支持GPT3.5/GPT4,其他模型转化效果未知。目前对机器学习类文献转化效果最好,其他类型文献转化效果未知。仅在Windows系统进行了测试,其他操作系统表现未知。"])
|
"对整个Latex项目进行纠错, 用latex编译为PDF对修正处做高亮。函数插件贡献者: Binary-Husky。注意事项: 目前仅支持GPT3.5/GPT4,其他模型转化效果未知。目前对机器学习类文献转化效果最好,其他类型文献转化效果未知。仅在Windows系统进行了测试,其他操作系统表现未知。"])
|
||||||
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
||||||
|
|
||||||
|
# <-------------- more requirements ------------->
|
||||||
|
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
|
||||||
|
more_req = plugin_kwargs.get("advanced_arg", "")
|
||||||
|
_switch_prompt_ = partial(switch_prompt, more_requirement=more_req)
|
||||||
|
|
||||||
# <-------------- check deps ------------->
|
# <-------------- check deps ------------->
|
||||||
try:
|
try:
|
||||||
@@ -146,7 +160,7 @@ def Latex英文纠错加PDF对比(txt, llm_kwargs, plugin_kwargs, chatbot, histo
|
|||||||
from .latex_utils import Latex精细分解与转化, 编译Latex
|
from .latex_utils import Latex精细分解与转化, 编译Latex
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
chatbot.append([ f"解析项目: {txt}",
|
chatbot.append([ f"解析项目: {txt}",
|
||||||
f"尝试执行Latex指令失败。Latex没有安装, 或者不在环境变量PATH中。报错信息\n\n```\n\n{trimmed_format_exc()}\n\n```\n\n"])
|
f"尝试执行Latex指令失败。Latex没有安装, 或者不在环境变量PATH中。安装方法https://tug.org/texlive/。报错信息\n\n```\n\n{trimmed_format_exc()}\n\n```\n\n"])
|
||||||
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
||||||
return
|
return
|
||||||
|
|
||||||
@@ -176,23 +190,26 @@ def Latex英文纠错加PDF对比(txt, llm_kwargs, plugin_kwargs, chatbot, histo
|
|||||||
|
|
||||||
|
|
||||||
# <-------------- if merge_translate_zh is already generated, skip gpt req ------------->
|
# <-------------- if merge_translate_zh is already generated, skip gpt req ------------->
|
||||||
if not os.path.exists(project_folder + '/merge_proofread.tex'):
|
if not os.path.exists(project_folder + '/merge_proofread_en.tex'):
|
||||||
yield from Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, mode='proofread_latex', switch_prompt=switch_prompt)
|
yield from Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin_kwargs,
|
||||||
|
chatbot, history, system_prompt, mode='proofread_en', switch_prompt=_switch_prompt_)
|
||||||
|
|
||||||
|
|
||||||
# <-------------- compile PDF ------------->
|
# <-------------- compile PDF ------------->
|
||||||
success = yield from 编译Latex(chatbot, history, main_file_original='merge', main_file_modified='merge_proofread',
|
success = yield from 编译Latex(chatbot, history, main_file_original='merge', main_file_modified='merge_proofread_en',
|
||||||
work_folder_original=project_folder, work_folder_modified=project_folder, work_folder=project_folder)
|
work_folder_original=project_folder, work_folder_modified=project_folder, work_folder=project_folder)
|
||||||
|
|
||||||
|
|
||||||
# <-------------- zip PDF ------------->
|
# <-------------- zip PDF ------------->
|
||||||
zip_result(project_folder)
|
zip_res = zip_result(project_folder)
|
||||||
if success:
|
if success:
|
||||||
chatbot.append((f"成功啦", '请查收结果(压缩包)...'))
|
chatbot.append((f"成功啦", '请查收结果(压缩包)...'))
|
||||||
yield from update_ui(chatbot=chatbot, history=history); time.sleep(1) # 刷新界面
|
yield from update_ui(chatbot=chatbot, history=history); time.sleep(1) # 刷新界面
|
||||||
|
promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
|
||||||
else:
|
else:
|
||||||
chatbot.append((f"失败了", '虽然PDF生成失败了, 但请查收结果(压缩包), 内含已经翻译的Tex文档, 也是可读的, 您可以到Github Issue区, 用该压缩包+对话历史存档进行反馈 ...'))
|
chatbot.append((f"失败了", '虽然PDF生成失败了, 但请查收结果(压缩包), 内含已经翻译的Tex文档, 也是可读的, 您可以到Github Issue区, 用该压缩包+对话历史存档进行反馈 ...'))
|
||||||
yield from update_ui(chatbot=chatbot, history=history); time.sleep(1) # 刷新界面
|
yield from update_ui(chatbot=chatbot, history=history); time.sleep(1) # 刷新界面
|
||||||
|
promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
|
||||||
|
|
||||||
# <-------------- we are done ------------->
|
# <-------------- we are done ------------->
|
||||||
return success
|
return success
|
||||||
@@ -205,9 +222,13 @@ def Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot,
|
|||||||
# <-------------- information about this plugin ------------->
|
# <-------------- information about this plugin ------------->
|
||||||
chatbot.append([
|
chatbot.append([
|
||||||
"函数插件功能?",
|
"函数插件功能?",
|
||||||
"对整个Latex项目进行翻译, 生成中文PDF。函数插件贡献者: Binary-Husky。注意事项: 目前仅支持GPT3.5/GPT4,其他模型转化效果未知。目前对机器学习类文献转化效果最好,其他类型文献转化效果未知。"])
|
"对整个Latex项目进行翻译, 生成中文PDF。函数插件贡献者: Binary-Husky。注意事项: 此插件Windows支持最佳,Linux下必须使用Docker安装,详见项目主README.md。目前仅支持GPT3.5/GPT4,其他模型转化效果未知。目前对机器学习类文献转化效果最好,其他类型文献转化效果未知。"])
|
||||||
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
||||||
|
|
||||||
|
# <-------------- more requirements ------------->
|
||||||
|
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
|
||||||
|
more_req = plugin_kwargs.get("advanced_arg", "")
|
||||||
|
_switch_prompt_ = partial(switch_prompt, more_requirement=more_req)
|
||||||
|
|
||||||
# <-------------- check deps ------------->
|
# <-------------- check deps ------------->
|
||||||
try:
|
try:
|
||||||
@@ -216,7 +237,7 @@ def Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot,
|
|||||||
from .latex_utils import Latex精细分解与转化, 编译Latex
|
from .latex_utils import Latex精细分解与转化, 编译Latex
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
chatbot.append([ f"解析项目: {txt}",
|
chatbot.append([ f"解析项目: {txt}",
|
||||||
f"尝试执行Latex指令失败。Latex没有安装, 或者不在环境变量PATH中。报错信息\n\n```\n\n{trimmed_format_exc()}\n\n```\n\n"])
|
f"尝试执行Latex指令失败。Latex没有安装, 或者不在环境变量PATH中。安装方法https://tug.org/texlive/。报错信息\n\n```\n\n{trimmed_format_exc()}\n\n```\n\n"])
|
||||||
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
||||||
return
|
return
|
||||||
|
|
||||||
@@ -255,21 +276,24 @@ def Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot,
|
|||||||
|
|
||||||
# <-------------- if merge_translate_zh is already generated, skip gpt req ------------->
|
# <-------------- if merge_translate_zh is already generated, skip gpt req ------------->
|
||||||
if not os.path.exists(project_folder + '/merge_translate_zh.tex'):
|
if not os.path.exists(project_folder + '/merge_translate_zh.tex'):
|
||||||
yield from Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, mode='translate_zh', switch_prompt=switch_prompt)
|
yield from Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin_kwargs,
|
||||||
|
chatbot, history, system_prompt, mode='translate_zh', switch_prompt=_switch_prompt_)
|
||||||
|
|
||||||
|
|
||||||
# <-------------- compile PDF ------------->
|
# <-------------- compile PDF ------------->
|
||||||
success = yield from 编译Latex(chatbot, history, main_file_original='merge', main_file_modified='merge_translate_zh',
|
success = yield from 编译Latex(chatbot, history, main_file_original='merge', main_file_modified='merge_translate_zh', mode='translate_zh',
|
||||||
work_folder_original=project_folder, work_folder_modified=project_folder, work_folder=project_folder)
|
work_folder_original=project_folder, work_folder_modified=project_folder, work_folder=project_folder)
|
||||||
|
|
||||||
# <-------------- zip PDF ------------->
|
# <-------------- zip PDF ------------->
|
||||||
zip_result(project_folder)
|
zip_res = zip_result(project_folder)
|
||||||
if success:
|
if success:
|
||||||
chatbot.append((f"成功啦", '请查收结果(压缩包)...'))
|
chatbot.append((f"成功啦", '请查收结果(压缩包)...'))
|
||||||
yield from update_ui(chatbot=chatbot, history=history); time.sleep(1) # 刷新界面
|
yield from update_ui(chatbot=chatbot, history=history); time.sleep(1) # 刷新界面
|
||||||
|
promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
|
||||||
else:
|
else:
|
||||||
chatbot.append((f"失败了", '虽然PDF生成失败了, 但请查收结果(压缩包), 内含已经翻译的Tex文档, 也是可读的, 您可以到Github Issue区, 用该压缩包+对话历史存档进行反馈 ...'))
|
chatbot.append((f"失败了", '虽然PDF生成失败了, 但请查收结果(压缩包), 内含已经翻译的Tex文档, 也是可读的, 您可以到Github Issue区, 用该压缩包+对话历史存档进行反馈 ...'))
|
||||||
yield from update_ui(chatbot=chatbot, history=history); time.sleep(1) # 刷新界面
|
yield from update_ui(chatbot=chatbot, history=history); time.sleep(1) # 刷新界面
|
||||||
|
promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
|
||||||
|
|
||||||
|
|
||||||
# <-------------- we are done ------------->
|
# <-------------- we are done ------------->
|
||||||
|
|||||||
@@ -188,7 +188,15 @@ def test_Latex():
|
|||||||
# txt = r"https://arxiv.org/abs/2305.17608"
|
# txt = r"https://arxiv.org/abs/2305.17608"
|
||||||
# txt = r"https://arxiv.org/abs/2211.16068" # ACE
|
# txt = r"https://arxiv.org/abs/2211.16068" # ACE
|
||||||
# txt = r"C:\Users\x\arxiv_cache\2211.16068\workfolder" # ACE
|
# txt = r"C:\Users\x\arxiv_cache\2211.16068\workfolder" # ACE
|
||||||
txt = r"https://arxiv.org/abs/2002.09253"
|
# txt = r"https://arxiv.org/abs/2002.09253"
|
||||||
|
# txt = r"https://arxiv.org/abs/2306.07831"
|
||||||
|
# txt = r"https://arxiv.org/abs/2212.10156"
|
||||||
|
# txt = r"https://arxiv.org/abs/2211.11559"
|
||||||
|
# txt = r"https://arxiv.org/abs/2303.08774"
|
||||||
|
txt = r"https://arxiv.org/abs/2303.12712"
|
||||||
|
# txt = r"C:\Users\fuqingxu\arxiv_cache\2303.12712\workfolder"
|
||||||
|
|
||||||
|
|
||||||
for cookies, cb, hist, msg in (Latex翻译中文并重新编译PDF)(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
|
for cookies, cb, hist, msg in (Latex翻译中文并重新编译PDF)(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
|
||||||
cli_printer.print(cb) # print(cb)
|
cli_printer.print(cb) # print(cb)
|
||||||
|
|
||||||
@@ -217,6 +225,7 @@ def test_Latex():
|
|||||||
# test_数学动画生成manim()
|
# test_数学动画生成manim()
|
||||||
# test_Langchain知识库()
|
# test_Langchain知识库()
|
||||||
# test_Langchain知识库读取()
|
# test_Langchain知识库读取()
|
||||||
test_Latex()
|
if __name__ == "__main__":
|
||||||
input("程序完成,回车退出。")
|
test_Latex()
|
||||||
print("退出。")
|
input("程序完成,回车退出。")
|
||||||
|
print("退出。")
|
||||||
@@ -698,3 +698,51 @@ def try_install_deps(deps):
|
|||||||
for dep in deps:
|
for dep in deps:
|
||||||
import subprocess, sys
|
import subprocess, sys
|
||||||
subprocess.check_call([sys.executable, '-m', 'pip', 'install', '--user', dep])
|
subprocess.check_call([sys.executable, '-m', 'pip', 'install', '--user', dep])
|
||||||
|
|
||||||
|
|
||||||
|
class construct_html():
|
||||||
|
def __init__(self) -> None:
|
||||||
|
self.css = """
|
||||||
|
.row {
|
||||||
|
display: flex;
|
||||||
|
flex-wrap: wrap;
|
||||||
|
}
|
||||||
|
|
||||||
|
.column {
|
||||||
|
flex: 1;
|
||||||
|
padding: 10px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.table-header {
|
||||||
|
font-weight: bold;
|
||||||
|
border-bottom: 1px solid black;
|
||||||
|
}
|
||||||
|
|
||||||
|
.table-row {
|
||||||
|
border-bottom: 1px solid lightgray;
|
||||||
|
}
|
||||||
|
|
||||||
|
.table-cell {
|
||||||
|
padding: 5px;
|
||||||
|
}
|
||||||
|
"""
|
||||||
|
self.html_string = f'<!DOCTYPE html><head><meta charset="utf-8"><title>翻译结果</title><style>{self.css}</style></head>'
|
||||||
|
|
||||||
|
|
||||||
|
def add_row(self, a, b):
|
||||||
|
tmp = """
|
||||||
|
<div class="row table-row">
|
||||||
|
<div class="column table-cell">REPLACE_A</div>
|
||||||
|
<div class="column table-cell">REPLACE_B</div>
|
||||||
|
</div>
|
||||||
|
"""
|
||||||
|
from toolbox import markdown_convertion
|
||||||
|
tmp = tmp.replace('REPLACE_A', markdown_convertion(a))
|
||||||
|
tmp = tmp.replace('REPLACE_B', markdown_convertion(b))
|
||||||
|
self.html_string += tmp
|
||||||
|
|
||||||
|
|
||||||
|
def save_file(self, file_name):
|
||||||
|
with open(f'./gpt_log/{file_name}', 'w', encoding='utf8') as f:
|
||||||
|
f.write(self.html_string.encode('utf-8', 'ignore').decode())
|
||||||
|
|
||||||
|
|||||||
@@ -8,31 +8,69 @@ pj = os.path.join
|
|||||||
"""
|
"""
|
||||||
========================================================================
|
========================================================================
|
||||||
Part One
|
Part One
|
||||||
Latex segmentation to a linklist
|
Latex segmentation with a binary mask (PRESERVE=0, TRANSFORM=1)
|
||||||
========================================================================
|
========================================================================
|
||||||
"""
|
"""
|
||||||
PRESERVE = 0
|
PRESERVE = 0
|
||||||
TRANSFORM = 1
|
TRANSFORM = 1
|
||||||
|
|
||||||
def split_worker(text, mask, pattern, flags=0):
|
def set_forbidden_text(text, mask, pattern, flags=0):
|
||||||
"""
|
"""
|
||||||
Add a preserve text area in this paper
|
Add a preserve text area in this paper
|
||||||
|
e.g. with pattern = r"\\begin\{algorithm\}(.*?)\\end\{algorithm\}"
|
||||||
|
you can mask out (mask = PRESERVE so that text become untouchable for GPT)
|
||||||
|
everything between "\begin{equation}" and "\end{equation}"
|
||||||
"""
|
"""
|
||||||
|
if isinstance(pattern, list): pattern = '|'.join(pattern)
|
||||||
pattern_compile = re.compile(pattern, flags)
|
pattern_compile = re.compile(pattern, flags)
|
||||||
for res in pattern_compile.finditer(text):
|
for res in pattern_compile.finditer(text):
|
||||||
mask[res.span()[0]:res.span()[1]] = PRESERVE
|
mask[res.span()[0]:res.span()[1]] = PRESERVE
|
||||||
return text, mask
|
return text, mask
|
||||||
|
|
||||||
def split_worker_reverse_caption(text, mask, pattern, flags=0):
|
def set_forbidden_text_careful_brace(text, mask, pattern, flags=0):
|
||||||
"""
|
"""
|
||||||
Move caption area out of preserve area
|
Add a preserve text area in this paper (text become untouchable for GPT).
|
||||||
|
count the number of the braces so as to catch compelete text area.
|
||||||
|
e.g.
|
||||||
|
\caption{blablablablabla\texbf{blablabla}blablabla.}
|
||||||
"""
|
"""
|
||||||
pattern_compile = re.compile(pattern, flags)
|
pattern_compile = re.compile(pattern, flags)
|
||||||
for res in pattern_compile.finditer(text):
|
for res in pattern_compile.finditer(text):
|
||||||
mask[res.regs[1][0]:res.regs[1][1]] = TRANSFORM
|
brace_level = -1
|
||||||
|
p = begin = end = res.regs[0][0]
|
||||||
|
for _ in range(1024*16):
|
||||||
|
if text[p] == '}' and brace_level == 0: break
|
||||||
|
elif text[p] == '}': brace_level -= 1
|
||||||
|
elif text[p] == '{': brace_level += 1
|
||||||
|
p += 1
|
||||||
|
end = p+1
|
||||||
|
mask[begin:end] = PRESERVE
|
||||||
return text, mask
|
return text, mask
|
||||||
|
|
||||||
def split_worker_begin_end(text, mask, pattern, flags=0, limit_n_lines=42):
|
def reverse_forbidden_text_careful_brace(text, mask, pattern, flags=0, forbid_wrapper=True):
|
||||||
|
"""
|
||||||
|
Move area out of preserve area (make text editable for GPT)
|
||||||
|
count the number of the braces so as to catch compelete text area.
|
||||||
|
e.g.
|
||||||
|
\caption{blablablablabla\texbf{blablabla}blablabla.}
|
||||||
|
"""
|
||||||
|
pattern_compile = re.compile(pattern, flags)
|
||||||
|
for res in pattern_compile.finditer(text):
|
||||||
|
brace_level = 0
|
||||||
|
p = begin = end = res.regs[1][0]
|
||||||
|
for _ in range(1024*16):
|
||||||
|
if text[p] == '}' and brace_level == 0: break
|
||||||
|
elif text[p] == '}': brace_level -= 1
|
||||||
|
elif text[p] == '{': brace_level += 1
|
||||||
|
p += 1
|
||||||
|
end = p
|
||||||
|
mask[begin:end] = TRANSFORM
|
||||||
|
if forbid_wrapper:
|
||||||
|
mask[res.regs[0][0]:begin] = PRESERVE
|
||||||
|
mask[end:res.regs[0][1]] = PRESERVE
|
||||||
|
return text, mask
|
||||||
|
|
||||||
|
def set_forbidden_text_begin_end(text, mask, pattern, flags=0, limit_n_lines=42):
|
||||||
"""
|
"""
|
||||||
Find all \begin{} ... \end{} text block that with less than limit_n_lines lines.
|
Find all \begin{} ... \end{} text block that with less than limit_n_lines lines.
|
||||||
Add it to preserve area
|
Add it to preserve area
|
||||||
@@ -85,29 +123,54 @@ Latex Merge File
|
|||||||
def 寻找Latex主文件(file_manifest, mode):
|
def 寻找Latex主文件(file_manifest, mode):
|
||||||
"""
|
"""
|
||||||
在多Tex文档中,寻找主文件,必须包含documentclass,返回找到的第一个。
|
在多Tex文档中,寻找主文件,必须包含documentclass,返回找到的第一个。
|
||||||
P.S. 但愿没人把latex模板放在里面传进来
|
P.S. 但愿没人把latex模板放在里面传进来 (6.25 加入判定latex模板的代码)
|
||||||
"""
|
"""
|
||||||
|
canidates = []
|
||||||
for texf in file_manifest:
|
for texf in file_manifest:
|
||||||
if os.path.basename(texf).startswith('merge'):
|
if os.path.basename(texf).startswith('merge'):
|
||||||
continue
|
continue
|
||||||
with open(texf, 'r', encoding='utf8') as f:
|
with open(texf, 'r', encoding='utf8') as f:
|
||||||
file_content = f.read()
|
file_content = f.read()
|
||||||
if r'\documentclass' in file_content:
|
if r'\documentclass' in file_content:
|
||||||
return texf
|
canidates.append(texf)
|
||||||
else:
|
else:
|
||||||
continue
|
continue
|
||||||
raise RuntimeError('无法找到一个主Tex文件(包含documentclass关键字)')
|
|
||||||
|
if len(canidates) == 0:
|
||||||
|
raise RuntimeError('无法找到一个主Tex文件(包含documentclass关键字)')
|
||||||
|
elif len(canidates) == 1:
|
||||||
|
return canidates[0]
|
||||||
|
else: # if len(canidates) >= 2 通过一些Latex模板中常见(但通常不会出现在正文)的单词,对不同latex源文件扣分,取评分最高者返回
|
||||||
|
canidates_score = []
|
||||||
|
# 给出一些判定模板文档的词作为扣分项
|
||||||
|
unexpected_words = ['\LaTeX', 'manuscript', 'Guidelines', 'font', 'citations', 'rejected', 'blind review', 'reviewers']
|
||||||
|
expected_words = ['\input', '\ref', '\cite']
|
||||||
|
for texf in canidates:
|
||||||
|
canidates_score.append(0)
|
||||||
|
with open(texf, 'r', encoding='utf8') as f:
|
||||||
|
file_content = f.read()
|
||||||
|
for uw in unexpected_words:
|
||||||
|
if uw in file_content:
|
||||||
|
canidates_score[-1] -= 1
|
||||||
|
for uw in expected_words:
|
||||||
|
if uw in file_content:
|
||||||
|
canidates_score[-1] += 1
|
||||||
|
select = np.argmax(canidates_score) # 取评分最高者返回
|
||||||
|
return canidates[select]
|
||||||
|
|
||||||
def rm_comments(main_file):
|
def rm_comments(main_file):
|
||||||
new_file_remove_comment_lines = []
|
new_file_remove_comment_lines = []
|
||||||
for l in main_file.splitlines():
|
for l in main_file.splitlines():
|
||||||
# 删除整行的空注释
|
# 删除整行的空注释
|
||||||
if l.startswith("%") or (l.startswith(" ") and l.lstrip().startswith("%")):
|
if l.lstrip().startswith("%"):
|
||||||
pass
|
pass
|
||||||
else:
|
else:
|
||||||
new_file_remove_comment_lines.append(l)
|
new_file_remove_comment_lines.append(l)
|
||||||
main_file = '\n'.join(new_file_remove_comment_lines)
|
main_file = '\n'.join(new_file_remove_comment_lines)
|
||||||
|
# main_file = re.sub(r"\\include{(.*?)}", r"\\input{\1}", main_file) # 将 \include 命令转换为 \input 命令
|
||||||
main_file = re.sub(r'(?<!\\)%.*', '', main_file) # 使用正则表达式查找半行注释, 并替换为空字符串
|
main_file = re.sub(r'(?<!\\)%.*', '', main_file) # 使用正则表达式查找半行注释, 并替换为空字符串
|
||||||
return main_file
|
return main_file
|
||||||
|
|
||||||
def merge_tex_files_(project_foler, main_file, mode):
|
def merge_tex_files_(project_foler, main_file, mode):
|
||||||
"""
|
"""
|
||||||
Merge Tex project recrusively
|
Merge Tex project recrusively
|
||||||
@@ -138,17 +201,24 @@ def merge_tex_files(project_foler, main_file, mode):
|
|||||||
main_file = rm_comments(main_file)
|
main_file = rm_comments(main_file)
|
||||||
|
|
||||||
if mode == 'translate_zh':
|
if mode == 'translate_zh':
|
||||||
|
# find paper documentclass
|
||||||
pattern = re.compile(r'\\documentclass.*\n')
|
pattern = re.compile(r'\\documentclass.*\n')
|
||||||
match = pattern.search(main_file)
|
match = pattern.search(main_file)
|
||||||
|
assert match is not None, "Cannot find documentclass statement!"
|
||||||
position = match.end()
|
position = match.end()
|
||||||
add_ctex = '\\usepackage{ctex}\n'
|
add_ctex = '\\usepackage{ctex}\n'
|
||||||
add_url = '\\usepackage{url}\n' if '{url}' not in main_file else ''
|
add_url = '\\usepackage{url}\n' if '{url}' not in main_file else ''
|
||||||
main_file = main_file[:position] + add_ctex + add_url + main_file[position:]
|
main_file = main_file[:position] + add_ctex + add_url + main_file[position:]
|
||||||
# 2 fontset=windows
|
# fontset=windows
|
||||||
import platform
|
import platform
|
||||||
if platform.system() != 'Windows':
|
main_file = re.sub(r"\\documentclass\[(.*?)\]{(.*?)}", r"\\documentclass[\1,fontset=windows,UTF8]{\2}",main_file)
|
||||||
main_file = re.sub(r"\\documentclass\[(.*?)\]{(.*?)}", r"\\documentclass[\1,fontset=windows]{\2}",main_file)
|
main_file = re.sub(r"\\documentclass{(.*?)}", r"\\documentclass[fontset=windows,UTF8]{\1}",main_file)
|
||||||
main_file = re.sub(r"\\documentclass{(.*?)}", r"\\documentclass[fontset=windows]{\1}",main_file)
|
# find paper abstract
|
||||||
|
pattern_opt1 = re.compile(r'\\begin\{abstract\}.*\n')
|
||||||
|
pattern_opt2 = re.compile(r"\\abstract\{(.*?)\}", flags=re.DOTALL)
|
||||||
|
match_opt1 = pattern_opt1.search(main_file)
|
||||||
|
match_opt2 = pattern_opt2.search(main_file)
|
||||||
|
assert (match_opt1 is not None) or (match_opt2 is not None), "Cannot find paper abstract section!"
|
||||||
return main_file
|
return main_file
|
||||||
|
|
||||||
|
|
||||||
@@ -180,19 +250,46 @@ def fix_content(final_tex, node_string):
|
|||||||
final_tex = re.sub(r"\\\ ([a-z]{2,10})\{", r"\\\1{", string=final_tex)
|
final_tex = re.sub(r"\\\ ([a-z]{2,10})\{", r"\\\1{", string=final_tex)
|
||||||
final_tex = re.sub(r"\\([a-z]{2,10})\{([^\}]*?)\}", mod_inbraket, string=final_tex)
|
final_tex = re.sub(r"\\([a-z]{2,10})\{([^\}]*?)\}", mod_inbraket, string=final_tex)
|
||||||
|
|
||||||
|
if "Traceback" in final_tex and "[Local Message]" in final_tex:
|
||||||
|
final_tex = node_string # 出问题了,还原原文
|
||||||
if node_string.count('\\begin') != final_tex.count('\\begin'):
|
if node_string.count('\\begin') != final_tex.count('\\begin'):
|
||||||
final_tex = node_string # 出问题了,还原原文
|
final_tex = node_string # 出问题了,还原原文
|
||||||
if node_string.count('\_') > 0 and node_string.count('\_') > final_tex.count('\_'):
|
if node_string.count('\_') > 0 and node_string.count('\_') > final_tex.count('\_'):
|
||||||
# walk and replace any _ without \
|
# walk and replace any _ without \
|
||||||
final_tex = re.sub(r"(?<!\\)_", "\\_", final_tex)
|
final_tex = re.sub(r"(?<!\\)_", "\\_", final_tex)
|
||||||
if node_string.count('{') != node_string.count('}'):
|
|
||||||
if final_tex.count('{') != node_string.count('{'):
|
def compute_brace_level(string):
|
||||||
final_tex = node_string # 出问题了,还原原文
|
# this function count the number of { and }
|
||||||
if final_tex.count('}') != node_string.count('}'):
|
brace_level = 0
|
||||||
final_tex = node_string # 出问题了,还原原文
|
for c in string:
|
||||||
|
if c == "{": brace_level += 1
|
||||||
|
elif c == "}": brace_level -= 1
|
||||||
|
return brace_level
|
||||||
|
def join_most(tex_t, tex_o):
|
||||||
|
# this function join translated string and original string when something goes wrong
|
||||||
|
p_t = 0
|
||||||
|
p_o = 0
|
||||||
|
def find_next(string, chars, begin):
|
||||||
|
p = begin
|
||||||
|
while p < len(string):
|
||||||
|
if string[p] in chars: return p, string[p]
|
||||||
|
p += 1
|
||||||
|
return None, None
|
||||||
|
while True:
|
||||||
|
res1, char = find_next(tex_o, ['{','}'], p_o)
|
||||||
|
if res1 is None: break
|
||||||
|
res2, char = find_next(tex_t, [char], p_t)
|
||||||
|
if res2 is None: break
|
||||||
|
p_o = res1 + 1
|
||||||
|
p_t = res2 + 1
|
||||||
|
return tex_t[:p_t] + tex_o[p_o:]
|
||||||
|
|
||||||
|
if compute_brace_level(final_tex) != compute_brace_level(node_string):
|
||||||
|
# 出问题了,还原部分原文,保证括号正确
|
||||||
|
final_tex = join_most(final_tex, node_string)
|
||||||
return final_tex
|
return final_tex
|
||||||
|
|
||||||
def split_subprocess(txt, project_folder, return_dict):
|
def split_subprocess(txt, project_folder, return_dict, opts):
|
||||||
"""
|
"""
|
||||||
break down latex file to a linked list,
|
break down latex file to a linked list,
|
||||||
each node use a preserve flag to indicate whether it should
|
each node use a preserve flag to indicate whether it should
|
||||||
@@ -202,44 +299,33 @@ def split_subprocess(txt, project_folder, return_dict):
|
|||||||
mask = np.zeros(len(txt), dtype=np.uint8) + TRANSFORM
|
mask = np.zeros(len(txt), dtype=np.uint8) + TRANSFORM
|
||||||
|
|
||||||
# 吸收title与作者以上的部分
|
# 吸收title与作者以上的部分
|
||||||
text, mask = split_worker(text, mask, r"(.*?)\\maketitle", re.DOTALL)
|
text, mask = set_forbidden_text(text, mask, r"(.*?)\\maketitle", re.DOTALL)
|
||||||
# 删除iffalse注释
|
# 吸收iffalse注释
|
||||||
text, mask = split_worker(text, mask, r"\\iffalse(.*?)\\fi", re.DOTALL)
|
text, mask = set_forbidden_text(text, mask, r"\\iffalse(.*?)\\fi", re.DOTALL)
|
||||||
# 吸收在25行以内的begin-end组合
|
# 吸收在42行以内的begin-end组合
|
||||||
text, mask = split_worker_begin_end(text, mask, r"\\begin\{([a-z\*]*)\}(.*?)\\end\{\1\}", re.DOTALL, limit_n_lines=25)
|
text, mask = set_forbidden_text_begin_end(text, mask, r"\\begin\{([a-z\*]*)\}(.*?)\\end\{\1\}", re.DOTALL, limit_n_lines=42)
|
||||||
# 吸收匿名公式
|
# 吸收匿名公式
|
||||||
text, mask = split_worker(text, mask, r"\$\$(.*?)\$\$", re.DOTALL)
|
text, mask = set_forbidden_text(text, mask, [ r"\$\$(.*?)\$\$", r"\\\[.*?\\\]" ], re.DOTALL)
|
||||||
# 吸收其他杂项
|
# 吸收其他杂项
|
||||||
text, mask = split_worker(text, mask, r"\\section\{(.*?)\}")
|
text, mask = set_forbidden_text(text, mask, [ r"\\section\{(.*?)\}", r"\\section\*\{(.*?)\}", r"\\subsection\{(.*?)\}", r"\\subsubsection\{(.*?)\}" ])
|
||||||
text, mask = split_worker(text, mask, r"\\section\*\{(.*?)\}")
|
text, mask = set_forbidden_text(text, mask, [ r"\\bibliography\{(.*?)\}", r"\\bibliographystyle\{(.*?)\}" ])
|
||||||
text, mask = split_worker(text, mask, r"\\subsection\{(.*?)\}")
|
text, mask = set_forbidden_text(text, mask, r"\\begin\{thebibliography\}.*?\\end\{thebibliography\}", re.DOTALL)
|
||||||
text, mask = split_worker(text, mask, r"\\subsubsection\{(.*?)\}")
|
text, mask = set_forbidden_text(text, mask, r"\\begin\{lstlisting\}(.*?)\\end\{lstlisting\}", re.DOTALL)
|
||||||
text, mask = split_worker(text, mask, r"\\bibliography\{(.*?)\}")
|
text, mask = set_forbidden_text(text, mask, r"\\begin\{wraptable\}(.*?)\\end\{wraptable\}", re.DOTALL)
|
||||||
text, mask = split_worker(text, mask, r"\\bibliographystyle\{(.*?)\}")
|
text, mask = set_forbidden_text(text, mask, r"\\begin\{algorithm\}(.*?)\\end\{algorithm\}", re.DOTALL)
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{lstlisting\}(.*?)\\end\{lstlisting\}", re.DOTALL)
|
text, mask = set_forbidden_text(text, mask, [r"\\begin\{wrapfigure\}(.*?)\\end\{wrapfigure\}", r"\\begin\{wrapfigure\*\}(.*?)\\end\{wrapfigure\*\}"], re.DOTALL)
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{wraptable\}(.*?)\\end\{wraptable\}", re.DOTALL)
|
text, mask = set_forbidden_text(text, mask, [r"\\begin\{figure\}(.*?)\\end\{figure\}", r"\\begin\{figure\*\}(.*?)\\end\{figure\*\}"], re.DOTALL)
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{algorithm\}(.*?)\\end\{algorithm\}", re.DOTALL)
|
text, mask = set_forbidden_text(text, mask, [r"\\begin\{multline\}(.*?)\\end\{multline\}", r"\\begin\{multline\*\}(.*?)\\end\{multline\*\}"], re.DOTALL)
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{wrapfigure\}(.*?)\\end\{wrapfigure\}", re.DOTALL)
|
text, mask = set_forbidden_text(text, mask, [r"\\begin\{table\}(.*?)\\end\{table\}", r"\\begin\{table\*\}(.*?)\\end\{table\*\}"], re.DOTALL)
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{wrapfigure\*\}(.*?)\\end\{wrapfigure\*\}", re.DOTALL)
|
text, mask = set_forbidden_text(text, mask, [r"\\begin\{minipage\}(.*?)\\end\{minipage\}", r"\\begin\{minipage\*\}(.*?)\\end\{minipage\*\}"], re.DOTALL)
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{figure\}(.*?)\\end\{figure\}", re.DOTALL)
|
text, mask = set_forbidden_text(text, mask, [r"\\begin\{align\*\}(.*?)\\end\{align\*\}", r"\\begin\{align\}(.*?)\\end\{align\}"], re.DOTALL)
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{figure\*\}(.*?)\\end\{figure\*\}", re.DOTALL)
|
text, mask = set_forbidden_text(text, mask, [r"\\begin\{equation\}(.*?)\\end\{equation\}", r"\\begin\{equation\*\}(.*?)\\end\{equation\*\}"], re.DOTALL)
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{multline\}(.*?)\\end\{multline\}", re.DOTALL)
|
text, mask = set_forbidden_text(text, mask, [r"\\includepdf\[(.*?)\]\{(.*?)\}", r"\\clearpage", r"\\newpage", r"\\appendix", r"\\tableofcontents", r"\\include\{(.*?)\}"])
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{multline\*\}(.*?)\\end\{multline\*\}", re.DOTALL)
|
text, mask = set_forbidden_text(text, mask, [r"\\vspace\{(.*?)\}", r"\\hspace\{(.*?)\}", r"\\label\{(.*?)\}", r"\\begin\{(.*?)\}", r"\\end\{(.*?)\}", r"\\item "])
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{table\}(.*?)\\end\{table\}", re.DOTALL)
|
text, mask = set_forbidden_text_careful_brace(text, mask, r"\\hl\{(.*?)\}", re.DOTALL)
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{table\*\}(.*?)\\end\{table\*\}", re.DOTALL)
|
# reverse 操作必须放在最后
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{minipage\}(.*?)\\end\{minipage\}", re.DOTALL)
|
text, mask = reverse_forbidden_text_careful_brace(text, mask, r"\\caption\{(.*?)\}", re.DOTALL, forbid_wrapper=True)
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{minipage\*\}(.*?)\\end\{minipage\*\}", re.DOTALL)
|
text, mask = reverse_forbidden_text_careful_brace(text, mask, r"\\abstract\{(.*?)\}", re.DOTALL, forbid_wrapper=True)
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{align\*\}(.*?)\\end\{align\*\}", re.DOTALL)
|
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{align\}(.*?)\\end\{align\}", re.DOTALL)
|
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{equation\}(.*?)\\end\{equation\}", re.DOTALL)
|
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{equation\*\}(.*?)\\end\{equation\*\}", re.DOTALL)
|
|
||||||
text, mask = split_worker(text, mask, r"\\item ")
|
|
||||||
text, mask = split_worker(text, mask, r"\\label\{(.*?)\}")
|
|
||||||
text, mask = split_worker(text, mask, r"\\begin\{(.*?)\}")
|
|
||||||
text, mask = split_worker(text, mask, r"\\vspace\{(.*?)\}")
|
|
||||||
text, mask = split_worker(text, mask, r"\\hspace\{(.*?)\}")
|
|
||||||
text, mask = split_worker(text, mask, r"\\end\{(.*?)\}")
|
|
||||||
# text, mask = split_worker_reverse_caption(text, mask, r"\\caption\{(.*?)\}", re.DOTALL)
|
|
||||||
root = convert_to_linklist(text, mask)
|
root = convert_to_linklist(text, mask)
|
||||||
|
|
||||||
# 修复括号
|
# 修复括号
|
||||||
@@ -313,7 +399,7 @@ def split_subprocess(txt, project_folder, return_dict):
|
|||||||
prev_node = node
|
prev_node = node
|
||||||
node = node.next
|
node = node.next
|
||||||
if node is None: break
|
if node is None: break
|
||||||
|
# 输出html调试文件,用红色标注处保留区(PRESERVE),用黑色标注转换区(TRANSFORM)
|
||||||
with open(pj(project_folder, 'debug_log.html'), 'w', encoding='utf8') as f:
|
with open(pj(project_folder, 'debug_log.html'), 'w', encoding='utf8') as f:
|
||||||
segment_parts_for_gpt = []
|
segment_parts_for_gpt = []
|
||||||
nodes = []
|
nodes = []
|
||||||
@@ -344,8 +430,8 @@ class LatexPaperSplit():
|
|||||||
"""
|
"""
|
||||||
def __init__(self) -> None:
|
def __init__(self) -> None:
|
||||||
self.nodes = None
|
self.nodes = None
|
||||||
self.msg = "{\\scriptsize\\textbf{警告:该PDF由GPT-Academic开源项目调用大语言模型+Latex翻译插件一键生成," + \
|
self.msg = "*{\\scriptsize\\textbf{警告:该PDF由GPT-Academic开源项目调用大语言模型+Latex翻译插件一键生成," + \
|
||||||
"版权归原文作者所有。翻译内容可靠性无任何保障,请仔细鉴别并以原文为准。" + \
|
"版权归原文作者所有。翻译内容可靠性无保障,请仔细鉴别并以原文为准。" + \
|
||||||
"项目Github地址 \\url{https://github.com/binary-husky/gpt_academic/}。"
|
"项目Github地址 \\url{https://github.com/binary-husky/gpt_academic/}。"
|
||||||
# 请您不要删除或修改这行警告,除非您是论文的原作者(如果您是论文原作者,欢迎加REAME中的QQ联系开发者)
|
# 请您不要删除或修改这行警告,除非您是论文的原作者(如果您是论文原作者,欢迎加REAME中的QQ联系开发者)
|
||||||
self.msg_declare = "为了防止大语言模型的意外谬误产生扩散影响,禁止移除或修改此警告。}}\\\\"
|
self.msg_declare = "为了防止大语言模型的意外谬误产生扩散影响,禁止移除或修改此警告。}}\\\\"
|
||||||
@@ -365,11 +451,18 @@ class LatexPaperSplit():
|
|||||||
if mode == 'translate_zh':
|
if mode == 'translate_zh':
|
||||||
pattern = re.compile(r'\\begin\{abstract\}.*\n')
|
pattern = re.compile(r'\\begin\{abstract\}.*\n')
|
||||||
match = pattern.search(result_string)
|
match = pattern.search(result_string)
|
||||||
position = match.end()
|
if not match:
|
||||||
|
# match \abstract{xxxx}
|
||||||
|
pattern_compile = re.compile(r"\\abstract\{(.*?)\}", flags=re.DOTALL)
|
||||||
|
match = pattern_compile.search(result_string)
|
||||||
|
position = match.regs[1][0]
|
||||||
|
else:
|
||||||
|
# match \begin{abstract}xxxx\end{abstract}
|
||||||
|
position = match.end()
|
||||||
result_string = result_string[:position] + self.msg + msg + self.msg_declare + result_string[position:]
|
result_string = result_string[:position] + self.msg + msg + self.msg_declare + result_string[position:]
|
||||||
return result_string
|
return result_string
|
||||||
|
|
||||||
def split(self, txt, project_folder):
|
def split(self, txt, project_folder, opts):
|
||||||
"""
|
"""
|
||||||
break down latex file to a linked list,
|
break down latex file to a linked list,
|
||||||
each node use a preserve flag to indicate whether it should
|
each node use a preserve flag to indicate whether it should
|
||||||
@@ -381,9 +474,10 @@ class LatexPaperSplit():
|
|||||||
return_dict = manager.dict()
|
return_dict = manager.dict()
|
||||||
p = multiprocessing.Process(
|
p = multiprocessing.Process(
|
||||||
target=split_subprocess,
|
target=split_subprocess,
|
||||||
args=(txt, project_folder, return_dict))
|
args=(txt, project_folder, return_dict, opts))
|
||||||
p.start()
|
p.start()
|
||||||
p.join()
|
p.join()
|
||||||
|
p.close()
|
||||||
self.nodes = return_dict['nodes']
|
self.nodes = return_dict['nodes']
|
||||||
self.sp = return_dict['segment_parts_for_gpt']
|
self.sp = return_dict['segment_parts_for_gpt']
|
||||||
return self.sp
|
return self.sp
|
||||||
@@ -438,9 +532,35 @@ class LatexPaperFileGroup():
|
|||||||
f.write(res)
|
f.write(res)
|
||||||
return manifest
|
return manifest
|
||||||
|
|
||||||
|
def write_html(sp_file_contents, sp_file_result, chatbot, project_folder):
|
||||||
|
|
||||||
|
# write html
|
||||||
|
try:
|
||||||
|
import shutil
|
||||||
|
from .crazy_utils import construct_html
|
||||||
|
from toolbox import gen_time_str
|
||||||
|
ch = construct_html()
|
||||||
|
orig = ""
|
||||||
|
trans = ""
|
||||||
|
final = []
|
||||||
|
for c,r in zip(sp_file_contents, sp_file_result):
|
||||||
|
final.append(c)
|
||||||
|
final.append(r)
|
||||||
|
for i, k in enumerate(final):
|
||||||
|
if i%2==0:
|
||||||
|
orig = k
|
||||||
|
if i%2==1:
|
||||||
|
trans = k
|
||||||
|
ch.add_row(a=orig, b=trans)
|
||||||
|
create_report_file_name = f"{gen_time_str()}.trans.html"
|
||||||
|
ch.save_file(create_report_file_name)
|
||||||
|
shutil.copyfile(pj('./gpt_log/', create_report_file_name), pj(project_folder, create_report_file_name))
|
||||||
|
promote_file_to_downloadzone(file=f'./gpt_log/{create_report_file_name}', chatbot=chatbot)
|
||||||
|
except:
|
||||||
|
from toolbox import trimmed_format_exc
|
||||||
|
print('writing html result failed:', trimmed_format_exc())
|
||||||
|
|
||||||
def Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, mode='proofread', switch_prompt=None):
|
def Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, mode='proofread', switch_prompt=None, opts=[]):
|
||||||
import time, os, re
|
import time, os, re
|
||||||
from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
|
from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
|
||||||
from .latex_utils import LatexPaperFileGroup, merge_tex_files, LatexPaperSplit, 寻找Latex主文件
|
from .latex_utils import LatexPaperFileGroup, merge_tex_files, LatexPaperSplit, 寻找Latex主文件
|
||||||
@@ -469,8 +589,10 @@ def Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin
|
|||||||
f.write(merged_content)
|
f.write(merged_content)
|
||||||
|
|
||||||
# <-------- 精细切分latex文件 ---------->
|
# <-------- 精细切分latex文件 ---------->
|
||||||
|
chatbot.append((f"Latex文件融合完成", f'[Local Message] 正在精细切分latex文件,这需要一段时间计算,文档越长耗时越长,请耐心等待。'))
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
||||||
lps = LatexPaperSplit()
|
lps = LatexPaperSplit()
|
||||||
res = lps.split(merged_content, project_folder) # 消耗时间的函数
|
res = lps.split(merged_content, project_folder, opts) # 消耗时间的函数
|
||||||
|
|
||||||
# <-------- 拆分过长的latex片段 ---------->
|
# <-------- 拆分过长的latex片段 ---------->
|
||||||
pfg = LatexPaperFileGroup()
|
pfg = LatexPaperFileGroup()
|
||||||
@@ -513,6 +635,7 @@ def Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin
|
|||||||
pfg.get_token_num = None
|
pfg.get_token_num = None
|
||||||
objdump(pfg, file=pj(project_folder,'temp.pkl'))
|
objdump(pfg, file=pj(project_folder,'temp.pkl'))
|
||||||
|
|
||||||
|
write_html(pfg.sp_file_contents, pfg.sp_file_result, chatbot=chatbot, project_folder=project_folder)
|
||||||
|
|
||||||
# <-------- 写出文件 ---------->
|
# <-------- 写出文件 ---------->
|
||||||
msg = f"当前大语言模型: {llm_kwargs['llm_model']},当前语言模型温度设定: {llm_kwargs['temperature']}。"
|
msg = f"当前大语言模型: {llm_kwargs['llm_model']},当前语言模型温度设定: {llm_kwargs['temperature']}。"
|
||||||
@@ -562,17 +685,18 @@ def compile_latex_with_timeout(command, timeout=60):
|
|||||||
return False
|
return False
|
||||||
return True
|
return True
|
||||||
|
|
||||||
def 编译Latex(chatbot, history, main_file_original, main_file_modified, work_folder_original, work_folder_modified, work_folder):
|
def 编译Latex(chatbot, history, main_file_original, main_file_modified, work_folder_original, work_folder_modified, work_folder, mode='default'):
|
||||||
import os, time
|
import os, time
|
||||||
current_dir = os.getcwd()
|
current_dir = os.getcwd()
|
||||||
n_fix = 1
|
n_fix = 1
|
||||||
max_try = 32
|
max_try = 32
|
||||||
chatbot.append([f"正在编译PDF文档", f'编译已经开始。当前工作路径为{work_folder},如果程序停顿5分钟以上,则大概率是卡死在Latex里面了。不幸卡死时请直接去该路径下取回翻译结果,或者重启之后再度尝试 ...']); yield from update_ui(chatbot=chatbot, history=history)
|
chatbot.append([f"正在编译PDF文档", f'编译已经开始。当前工作路径为{work_folder},如果程序停顿5分钟以上,请直接去该路径下取回翻译结果,或者重启之后再度尝试 ...']); yield from update_ui(chatbot=chatbot, history=history)
|
||||||
chatbot.append([f"正在编译PDF文档", '...']); yield from update_ui(chatbot=chatbot, history=history); time.sleep(1); chatbot[-1] = list(chatbot[-1]) # 刷新界面
|
chatbot.append([f"正在编译PDF文档", '...']); yield from update_ui(chatbot=chatbot, history=history); time.sleep(1); chatbot[-1] = list(chatbot[-1]) # 刷新界面
|
||||||
yield from update_ui_lastest_msg('编译已经开始...', chatbot, history) # 刷新Gradio前端界面
|
yield from update_ui_lastest_msg('编译已经开始...', chatbot, history) # 刷新Gradio前端界面
|
||||||
|
|
||||||
while True:
|
while True:
|
||||||
import os
|
import os
|
||||||
|
|
||||||
# https://stackoverflow.com/questions/738755/dont-make-me-manually-abort-a-latex-compile-when-theres-an-error
|
# https://stackoverflow.com/questions/738755/dont-make-me-manually-abort-a-latex-compile-when-theres-an-error
|
||||||
yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 编译原始PDF ...', chatbot, history) # 刷新Gradio前端界面
|
yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 编译原始PDF ...', chatbot, history) # 刷新Gradio前端界面
|
||||||
os.chdir(work_folder_original); ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_original}.tex'); os.chdir(current_dir)
|
os.chdir(work_folder_original); ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_original}.tex'); os.chdir(current_dir)
|
||||||
@@ -594,15 +718,16 @@ def 编译Latex(chatbot, history, main_file_original, main_file_modified, work_f
|
|||||||
os.chdir(work_folder_original); ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_original}.tex'); os.chdir(current_dir)
|
os.chdir(work_folder_original); ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_original}.tex'); os.chdir(current_dir)
|
||||||
os.chdir(work_folder_modified); ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_modified}.tex'); os.chdir(current_dir)
|
os.chdir(work_folder_modified); ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_modified}.tex'); os.chdir(current_dir)
|
||||||
|
|
||||||
yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 使用latexdiff生成论文转化前后对比 ...', chatbot, history) # 刷新Gradio前端界面
|
if mode!='translate_zh':
|
||||||
print( f'latexdiff --encoding=utf8 --append-safecmd=subfile {work_folder_original}/{main_file_original}.tex {work_folder_modified}/{main_file_modified}.tex --flatten > {work_folder}/merge_diff.tex')
|
yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 使用latexdiff生成论文转化前后对比 ...', chatbot, history) # 刷新Gradio前端界面
|
||||||
ok = compile_latex_with_timeout(f'latexdiff --encoding=utf8 --append-safecmd=subfile {work_folder_original}/{main_file_original}.tex {work_folder_modified}/{main_file_modified}.tex --flatten > {work_folder}/merge_diff.tex')
|
print( f'latexdiff --encoding=utf8 --append-safecmd=subfile {work_folder_original}/{main_file_original}.tex {work_folder_modified}/{main_file_modified}.tex --flatten > {work_folder}/merge_diff.tex')
|
||||||
|
ok = compile_latex_with_timeout(f'latexdiff --encoding=utf8 --append-safecmd=subfile {work_folder_original}/{main_file_original}.tex {work_folder_modified}/{main_file_modified}.tex --flatten > {work_folder}/merge_diff.tex')
|
||||||
|
|
||||||
yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 正在编译对比PDF ...', chatbot, history) # 刷新Gradio前端界面
|
yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 正在编译对比PDF ...', chatbot, history) # 刷新Gradio前端界面
|
||||||
os.chdir(work_folder); ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error merge_diff.tex'); os.chdir(current_dir)
|
os.chdir(work_folder); ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error merge_diff.tex'); os.chdir(current_dir)
|
||||||
os.chdir(work_folder); ok = compile_latex_with_timeout(f'bibtex merge_diff.aux'); os.chdir(current_dir)
|
os.chdir(work_folder); ok = compile_latex_with_timeout(f'bibtex merge_diff.aux'); os.chdir(current_dir)
|
||||||
os.chdir(work_folder); ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error merge_diff.tex'); os.chdir(current_dir)
|
os.chdir(work_folder); ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error merge_diff.tex'); os.chdir(current_dir)
|
||||||
os.chdir(work_folder); ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error merge_diff.tex'); os.chdir(current_dir)
|
os.chdir(work_folder); ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error merge_diff.tex'); os.chdir(current_dir)
|
||||||
|
|
||||||
# <--------------------->
|
# <--------------------->
|
||||||
os.chdir(current_dir)
|
os.chdir(current_dir)
|
||||||
@@ -617,13 +742,15 @@ def 编译Latex(chatbot, history, main_file_original, main_file_modified, work_f
|
|||||||
results_ += f"对比PDF编译是否成功: {diff_pdf_success};"
|
results_ += f"对比PDF编译是否成功: {diff_pdf_success};"
|
||||||
yield from update_ui_lastest_msg(f'第{n_fix}编译结束:<br/>{results_}...', chatbot, history) # 刷新Gradio前端界面
|
yield from update_ui_lastest_msg(f'第{n_fix}编译结束:<br/>{results_}...', chatbot, history) # 刷新Gradio前端界面
|
||||||
|
|
||||||
|
if diff_pdf_success:
|
||||||
|
result_pdf = pj(work_folder_modified, f'merge_diff.pdf') # get pdf path
|
||||||
|
promote_file_to_downloadzone(result_pdf, rename_file=None, chatbot=chatbot) # promote file to web UI
|
||||||
if modified_pdf_success:
|
if modified_pdf_success:
|
||||||
yield from update_ui_lastest_msg(f'转化PDF编译已经成功, 即将退出 ...', chatbot, history) # 刷新Gradio前端界面
|
yield from update_ui_lastest_msg(f'转化PDF编译已经成功, 即将退出 ...', chatbot, history) # 刷新Gradio前端界面
|
||||||
os.chdir(current_dir)
|
result_pdf = pj(work_folder_modified, f'{main_file_modified}.pdf') # get pdf path
|
||||||
result_pdf = pj(work_folder_modified, f'{main_file_modified}.pdf')
|
|
||||||
if os.path.exists(pj(work_folder, '..', 'translation')):
|
if os.path.exists(pj(work_folder, '..', 'translation')):
|
||||||
shutil.copyfile(result_pdf, pj(work_folder, '..', 'translation', 'translate_zh.pdf'))
|
shutil.copyfile(result_pdf, pj(work_folder, '..', 'translation', 'translate_zh.pdf'))
|
||||||
promote_file_to_downloadzone(result_pdf)
|
promote_file_to_downloadzone(result_pdf, rename_file=None, chatbot=chatbot) # promote file to web UI
|
||||||
return True # 成功啦
|
return True # 成功啦
|
||||||
else:
|
else:
|
||||||
if n_fix>=max_try: break
|
if n_fix>=max_try: break
|
||||||
|
|||||||
@@ -1,4 +1,4 @@
|
|||||||
from toolbox import CatchException, update_ui
|
from toolbox import CatchException, update_ui, promote_file_to_downloadzone
|
||||||
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
|
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
|
||||||
import re
|
import re
|
||||||
|
|
||||||
@@ -29,9 +29,8 @@ def write_chat_to_file(chatbot, history=None, file_name=None):
|
|||||||
for h in history:
|
for h in history:
|
||||||
f.write("\n>>>" + h)
|
f.write("\n>>>" + h)
|
||||||
f.write('</code>')
|
f.write('</code>')
|
||||||
res = '对话历史写入:' + os.path.abspath(f'./gpt_log/{file_name}')
|
promote_file_to_downloadzone(f'./gpt_log/{file_name}', rename_file=file_name, chatbot=chatbot)
|
||||||
print(res)
|
return '对话历史写入:' + os.path.abspath(f'./gpt_log/{file_name}')
|
||||||
return res
|
|
||||||
|
|
||||||
def gen_file_preview(file_name):
|
def gen_file_preview(file_name):
|
||||||
try:
|
try:
|
||||||
|
|||||||
@@ -8,7 +8,7 @@ def inspect_dependency(chatbot, history):
|
|||||||
import manim
|
import manim
|
||||||
return True
|
return True
|
||||||
except:
|
except:
|
||||||
chatbot.append(["导入依赖失败", "使用该模块需要额外依赖,安装方法:```pip install manimgl```"])
|
chatbot.append(["导入依赖失败", "使用该模块需要额外依赖,安装方法:```pip install manim manimgl```"])
|
||||||
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
||||||
return False
|
return False
|
||||||
|
|
||||||
|
|||||||
@@ -13,7 +13,9 @@ def 解析PDF(file_name, llm_kwargs, plugin_kwargs, chatbot, history, system_pro
|
|||||||
# 递归地切割PDF文件,每一块(尽量是完整的一个section,比如introduction,experiment等,必要时再进行切割)
|
# 递归地切割PDF文件,每一块(尽量是完整的一个section,比如introduction,experiment等,必要时再进行切割)
|
||||||
# 的长度必须小于 2500 个 Token
|
# 的长度必须小于 2500 个 Token
|
||||||
file_content, page_one = read_and_clean_pdf_text(file_name) # (尝试)按照章节切割PDF
|
file_content, page_one = read_and_clean_pdf_text(file_name) # (尝试)按照章节切割PDF
|
||||||
|
file_content = file_content.encode('utf-8', 'ignore').decode() # avoid reading non-utf8 chars
|
||||||
|
page_one = str(page_one).encode('utf-8', 'ignore').decode() # avoid reading non-utf8 chars
|
||||||
|
|
||||||
TOKEN_LIMIT_PER_FRAGMENT = 2500
|
TOKEN_LIMIT_PER_FRAGMENT = 2500
|
||||||
|
|
||||||
from .crazy_utils import breakdown_txt_to_satisfy_token_limit_for_pdf
|
from .crazy_utils import breakdown_txt_to_satisfy_token_limit_for_pdf
|
||||||
|
|||||||
131
crazy_functions/虚空终端.py
普通文件
131
crazy_functions/虚空终端.py
普通文件
@@ -0,0 +1,131 @@
|
|||||||
|
from toolbox import CatchException, update_ui, gen_time_str
|
||||||
|
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
|
||||||
|
from .crazy_utils import input_clipping
|
||||||
|
|
||||||
|
|
||||||
|
prompt = """
|
||||||
|
I have to achieve some functionalities by calling one of the functions below.
|
||||||
|
Your job is to find the correct funtion to use to satisfy my requirement,
|
||||||
|
and then write python code to call this function with correct parameters.
|
||||||
|
|
||||||
|
These are functions you are allowed to choose from:
|
||||||
|
1.
|
||||||
|
功能描述: 总结音视频内容
|
||||||
|
调用函数: ConcludeAudioContent(txt, llm_kwargs)
|
||||||
|
参数说明:
|
||||||
|
txt: 音频文件的路径
|
||||||
|
llm_kwargs: 模型参数, 永远给定None
|
||||||
|
2.
|
||||||
|
功能描述: 将每次对话记录写入Markdown格式的文件中
|
||||||
|
调用函数: WriteMarkdown()
|
||||||
|
3.
|
||||||
|
功能描述: 将指定目录下的PDF文件从英文翻译成中文
|
||||||
|
调用函数: BatchTranslatePDFDocuments_MultiThreaded(txt, llm_kwargs)
|
||||||
|
参数说明:
|
||||||
|
txt: PDF文件所在的路径
|
||||||
|
llm_kwargs: 模型参数, 永远给定None
|
||||||
|
4.
|
||||||
|
功能描述: 根据文本使用GPT模型生成相应的图像
|
||||||
|
调用函数: ImageGeneration(txt, llm_kwargs)
|
||||||
|
参数说明:
|
||||||
|
txt: 图像生成所用到的提示文本
|
||||||
|
llm_kwargs: 模型参数, 永远给定None
|
||||||
|
5.
|
||||||
|
功能描述: 对输入的word文档进行摘要生成
|
||||||
|
调用函数: SummarizingWordDocuments(input_path, output_path)
|
||||||
|
参数说明:
|
||||||
|
input_path: 待处理的word文档路径
|
||||||
|
output_path: 摘要生成后的文档路径
|
||||||
|
|
||||||
|
|
||||||
|
You should always anwser with following format:
|
||||||
|
----------------
|
||||||
|
Code:
|
||||||
|
```
|
||||||
|
class AutoAcademic(object):
|
||||||
|
def __init__(self):
|
||||||
|
self.selected_function = "FILL_CORRECT_FUNCTION_HERE" # e.g., "GenerateImage"
|
||||||
|
self.txt = "FILL_MAIN_PARAMETER_HERE" # e.g., "荷叶上的蜻蜓"
|
||||||
|
self.llm_kwargs = None
|
||||||
|
```
|
||||||
|
Explanation:
|
||||||
|
只有GenerateImage和生成图像相关, 因此选择GenerateImage函数。
|
||||||
|
----------------
|
||||||
|
|
||||||
|
Now, this is my requirement:
|
||||||
|
|
||||||
|
"""
|
||||||
|
def get_fn_lib():
|
||||||
|
return {
|
||||||
|
"BatchTranslatePDFDocuments_MultiThreaded": ("crazy_functions.批量翻译PDF文档_多线程", "批量翻译PDF文档"),
|
||||||
|
"SummarizingWordDocuments": ("crazy_functions.总结word文档", "总结word文档"),
|
||||||
|
"ImageGeneration": ("crazy_functions.图片生成", "图片生成"),
|
||||||
|
"TranslateMarkdownFromEnglishToChinese": ("crazy_functions.批量Markdown翻译", "Markdown中译英"),
|
||||||
|
"SummaryAudioVideo": ("crazy_functions.总结音视频", "总结音视频"),
|
||||||
|
}
|
||||||
|
|
||||||
|
def inspect_dependency(chatbot, history):
|
||||||
|
return True
|
||||||
|
|
||||||
|
def eval_code(code, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
|
||||||
|
import subprocess, sys, os, shutil, importlib
|
||||||
|
|
||||||
|
with open('gpt_log/void_terminal_runtime.py', 'w', encoding='utf8') as f:
|
||||||
|
f.write(code)
|
||||||
|
|
||||||
|
try:
|
||||||
|
AutoAcademic = getattr(importlib.import_module('gpt_log.void_terminal_runtime', 'AutoAcademic'), 'AutoAcademic')
|
||||||
|
# importlib.reload(AutoAcademic)
|
||||||
|
auto_dict = AutoAcademic()
|
||||||
|
selected_function = auto_dict.selected_function
|
||||||
|
txt = auto_dict.txt
|
||||||
|
fp, fn = get_fn_lib()[selected_function]
|
||||||
|
fn_plugin = getattr(importlib.import_module(fp, fn), fn)
|
||||||
|
yield from fn_plugin(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port)
|
||||||
|
except:
|
||||||
|
from toolbox import trimmed_format_exc
|
||||||
|
chatbot.append(["执行错误", f"\n```\n{trimmed_format_exc()}\n```\n"])
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
||||||
|
|
||||||
|
def get_code_block(reply):
|
||||||
|
import re
|
||||||
|
pattern = r"```([\s\S]*?)```" # regex pattern to match code blocks
|
||||||
|
matches = re.findall(pattern, reply) # find all code blocks in text
|
||||||
|
if len(matches) != 1:
|
||||||
|
raise RuntimeError("GPT is not generating proper code.")
|
||||||
|
return matches[0].strip('python') # code block
|
||||||
|
|
||||||
|
@CatchException
|
||||||
|
def 终端(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
|
||||||
|
"""
|
||||||
|
txt 输入栏用户输入的文本, 例如需要翻译的一段话, 再例如一个包含了待处理文件的路径
|
||||||
|
llm_kwargs gpt模型参数, 如温度和top_p等, 一般原样传递下去就行
|
||||||
|
plugin_kwargs 插件模型的参数, 暂时没有用武之地
|
||||||
|
chatbot 聊天显示框的句柄, 用于显示给用户
|
||||||
|
history 聊天历史, 前情提要
|
||||||
|
system_prompt 给gpt的静默提醒
|
||||||
|
web_port 当前软件运行的端口号
|
||||||
|
"""
|
||||||
|
# 清空历史, 以免输入溢出
|
||||||
|
history = []
|
||||||
|
|
||||||
|
# 基本信息:功能、贡献者
|
||||||
|
chatbot.append(["函数插件功能?", "根据自然语言执行插件命令, 作者: binary-husky, 插件初始化中 ..."])
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
||||||
|
|
||||||
|
# # 尝试导入依赖, 如果缺少依赖, 则给出安装建议
|
||||||
|
# dep_ok = yield from inspect_dependency(chatbot=chatbot, history=history) # 刷新界面
|
||||||
|
# if not dep_ok: return
|
||||||
|
|
||||||
|
# 输入
|
||||||
|
i_say = prompt + txt
|
||||||
|
# 开始
|
||||||
|
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
|
||||||
|
inputs=i_say, inputs_show_user=txt,
|
||||||
|
llm_kwargs=llm_kwargs, chatbot=chatbot, history=[],
|
||||||
|
sys_prompt=""
|
||||||
|
)
|
||||||
|
|
||||||
|
# 将代码转为动画
|
||||||
|
code = get_code_block(gpt_say)
|
||||||
|
yield from eval_code(code, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port)
|
||||||
@@ -103,3 +103,30 @@ services:
|
|||||||
echo '[jittorllms] 正在从github拉取最新代码...' &&
|
echo '[jittorllms] 正在从github拉取最新代码...' &&
|
||||||
git --git-dir=request_llm/jittorllms/.git --work-tree=request_llm/jittorllms pull --force &&
|
git --git-dir=request_llm/jittorllms/.git --work-tree=request_llm/jittorllms pull --force &&
|
||||||
python3 -u main.py"
|
python3 -u main.py"
|
||||||
|
|
||||||
|
|
||||||
|
## ===================================================
|
||||||
|
## 【方案四】 chatgpt + Latex
|
||||||
|
## ===================================================
|
||||||
|
version: '3'
|
||||||
|
services:
|
||||||
|
gpt_academic_with_latex:
|
||||||
|
image: ghcr.io/binary-husky/gpt_academic_with_latex:master
|
||||||
|
environment:
|
||||||
|
# 请查阅 `config.py` 以查看所有的配置信息
|
||||||
|
API_KEY: ' sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx '
|
||||||
|
USE_PROXY: ' True '
|
||||||
|
proxies: ' { "http": "socks5h://localhost:10880", "https": "socks5h://localhost:10880", } '
|
||||||
|
LLM_MODEL: ' gpt-3.5-turbo '
|
||||||
|
AVAIL_LLM_MODELS: ' ["gpt-3.5-turbo", "gpt-4"] '
|
||||||
|
LOCAL_MODEL_DEVICE: ' cuda '
|
||||||
|
DEFAULT_WORKER_NUM: ' 10 '
|
||||||
|
WEB_PORT: ' 12303 '
|
||||||
|
|
||||||
|
# 与宿主的网络融合
|
||||||
|
network_mode: "host"
|
||||||
|
|
||||||
|
# 不使用代理网络拉取最新代码
|
||||||
|
command: >
|
||||||
|
bash -c "python3 -u main.py"
|
||||||
|
|
||||||
|
|||||||
25
docs/GithubAction+NoLocal+Latex
普通文件
25
docs/GithubAction+NoLocal+Latex
普通文件
@@ -0,0 +1,25 @@
|
|||||||
|
# 此Dockerfile适用于“无本地模型”的环境构建,如果需要使用chatglm等本地模型,请参考 docs/Dockerfile+ChatGLM
|
||||||
|
# - 1 修改 `config.py`
|
||||||
|
# - 2 构建 docker build -t gpt-academic-nolocal-latex -f docs/Dockerfile+NoLocal+Latex .
|
||||||
|
# - 3 运行 docker run -v /home/fuqingxu/arxiv_cache:/root/arxiv_cache --rm -it --net=host gpt-academic-nolocal-latex
|
||||||
|
|
||||||
|
FROM fuqingxu/python311_texlive_ctex:latest
|
||||||
|
|
||||||
|
# 指定路径
|
||||||
|
WORKDIR /gpt
|
||||||
|
|
||||||
|
RUN pip3 install gradio openai numpy arxiv rich
|
||||||
|
RUN pip3 install colorama Markdown pygments pymupdf
|
||||||
|
|
||||||
|
# 装载项目文件
|
||||||
|
COPY . .
|
||||||
|
|
||||||
|
|
||||||
|
# 安装依赖
|
||||||
|
RUN pip3 install -r requirements.txt
|
||||||
|
|
||||||
|
# 可选步骤,用于预热模块
|
||||||
|
RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'
|
||||||
|
|
||||||
|
# 启动
|
||||||
|
CMD ["python3", "-u", "main.py"]
|
||||||
@@ -58,6 +58,8 @@
|
|||||||
"连接网络回答问题": "ConnectToNetworkToAnswerQuestions",
|
"连接网络回答问题": "ConnectToNetworkToAnswerQuestions",
|
||||||
"联网的ChatGPT": "ChatGPTConnectedToNetwork",
|
"联网的ChatGPT": "ChatGPTConnectedToNetwork",
|
||||||
"解析任意code项目": "ParseAnyCodeProject",
|
"解析任意code项目": "ParseAnyCodeProject",
|
||||||
|
"读取知识库作答": "ReadKnowledgeArchiveAnswerQuestions",
|
||||||
|
"知识库问答": "UpdateKnowledgeArchive",
|
||||||
"同时问询_指定模型": "InquireSimultaneously_SpecifiedModel",
|
"同时问询_指定模型": "InquireSimultaneously_SpecifiedModel",
|
||||||
"图片生成": "ImageGeneration",
|
"图片生成": "ImageGeneration",
|
||||||
"test_解析ipynb文件": "Test_ParseIpynbFile",
|
"test_解析ipynb文件": "Test_ParseIpynbFile",
|
||||||
|
|||||||
152
docs/use_azure.md
普通文件
152
docs/use_azure.md
普通文件
@@ -0,0 +1,152 @@
|
|||||||
|
# 通过微软Azure云服务申请 Openai API
|
||||||
|
|
||||||
|
由于Openai和微软的关系,现在是可以通过微软的Azure云计算服务直接访问openai的api,免去了注册和网络的问题。
|
||||||
|
|
||||||
|
快速入门的官方文档的链接是:[快速入门 - 开始通过 Azure OpenAI 服务使用 ChatGPT 和 GPT-4 - Azure OpenAI Service | Microsoft Learn](https://learn.microsoft.com/zh-cn/azure/cognitive-services/openai/chatgpt-quickstart?pivots=programming-language-python)
|
||||||
|
|
||||||
|
# 申请API
|
||||||
|
|
||||||
|
按文档中的“先决条件”的介绍,出了编程的环境以外,还需要以下三个条件:
|
||||||
|
|
||||||
|
1. Azure账号并创建订阅
|
||||||
|
|
||||||
|
2. 为订阅添加Azure OpenAI 服务
|
||||||
|
|
||||||
|
3. 部署模型
|
||||||
|
|
||||||
|
## Azure账号并创建订阅
|
||||||
|
|
||||||
|
### Azure账号
|
||||||
|
|
||||||
|
创建Azure的账号时最好是有微软的账号,这样似乎更容易获得免费额度(第一个月的200美元,实测了一下,如果用一个刚注册的微软账号登录Azure的话,并没有这一个月的免费额度)。
|
||||||
|
|
||||||
|
创建Azure账号的网址是:[立即创建 Azure 免费帐户 | Microsoft Azure](https://azure.microsoft.com/zh-cn/free/)
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
打开网页后,点击 “免费开始使用” 会跳转到登录或注册页面,如果有微软的账户,直接登录即可,如果没有微软账户,那就需要到微软的网页再另行注册一个。
|
||||||
|
|
||||||
|
注意,Azure的页面和政策时不时会变化,已实际最新显示的为准就好。
|
||||||
|
|
||||||
|
### 创建订阅
|
||||||
|
|
||||||
|
注册好Azure后便可进入主页:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
首先需要在订阅里进行添加操作,点开后即可进入订阅的页面:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
第一次进来应该是空的,点添加即可创建新的订阅(可以是“免费”或者“即付即用”的订阅),其中订阅ID是后面申请Azure OpenAI需要使用的。
|
||||||
|
|
||||||
|
## 为订阅添加Azure OpenAI服务
|
||||||
|
|
||||||
|
之后回到首页,点Azure OpenAI即可进入OpenAI服务的页面(如果不显示的话,则在首页上方的搜索栏里搜索“openai”即可)。
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
不过现在这个服务还不能用。在使用前,还需要在这个网址申请一下:
|
||||||
|
|
||||||
|
[Request Access to Azure OpenAI Service (microsoft.com)](https://customervoice.microsoft.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR7en2Ais5pxKtso_Pz4b1_xUOFA5Qk1UWDRBMjg0WFhPMkIzTzhKQ1dWNyQlQCN0PWcu)
|
||||||
|
|
||||||
|
这里有二十来个问题,按照要求和自己的实际情况填写即可。
|
||||||
|
|
||||||
|
其中需要注意的是
|
||||||
|
|
||||||
|
1. 千万记得填对"订阅ID"
|
||||||
|
|
||||||
|
2. 需要填一个公司邮箱(可以不是注册用的邮箱)和公司网址
|
||||||
|
|
||||||
|
之后,在回到上面那个页面,点创建,就会进入创建页面了:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
需要填入“资源组”和“名称”,按照自己的需要填入即可。
|
||||||
|
|
||||||
|
完成后,在主页的“资源”里就可以看到刚才创建的“资源”了,点击进入后,就可以进行最后的部署了。
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
## 部署模型
|
||||||
|
|
||||||
|
进入资源页面后,在部署模型前,可以先点击“开发”,把密钥和终结点记下来。
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
之后,就可以去部署模型了,点击“部署”即可,会跳转到 Azure OpenAI Stuido 进行下面的操作:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
进入 Azure OpenAi Studio 后,点击新建部署,会弹出如下对话框:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
在这里选 gpt-35-turbo 或需要的模型并按需要填入“部署名”即可完成模型的部署。
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
这个部署名需要记下来。
|
||||||
|
|
||||||
|
到现在为止,申请操作就完成了,需要记下来的有下面几个东西:
|
||||||
|
|
||||||
|
● 密钥(1或2都可以)
|
||||||
|
|
||||||
|
● 终结点
|
||||||
|
|
||||||
|
● 部署名(不是模型名)
|
||||||
|
|
||||||
|
# 修改 config.py
|
||||||
|
|
||||||
|
```
|
||||||
|
AZURE_ENDPOINT = "填入终结点"
|
||||||
|
AZURE_API_KEY = "填入azure openai api的密钥"
|
||||||
|
AZURE_API_VERSION = "2023-05-15" # 默认使用 2023-05-15 版本,无需修改
|
||||||
|
AZURE_ENGINE = "填入部署名"
|
||||||
|
|
||||||
|
```
|
||||||
|
# API的使用
|
||||||
|
|
||||||
|
接下来就是具体怎么使用API了,还是可以参考官方文档:[快速入门 - 开始通过 Azure OpenAI 服务使用 ChatGPT 和 GPT-4 - Azure OpenAI Service | Microsoft Learn](https://learn.microsoft.com/zh-cn/azure/cognitive-services/openai/chatgpt-quickstart?pivots=programming-language-python)
|
||||||
|
|
||||||
|
和openai自己的api调用有点类似,都需要安装openai库,不同的是调用方式
|
||||||
|
|
||||||
|
```
|
||||||
|
import openai
|
||||||
|
openai.api_type = "azure" #固定格式,无需修改
|
||||||
|
openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT") #这里填入“终结点”
|
||||||
|
openai.api_version = "2023-05-15" #固定格式,无需修改
|
||||||
|
openai.api_key = os.getenv("AZURE_OPENAI_KEY") #这里填入“密钥1”或“密钥2”
|
||||||
|
|
||||||
|
response = openai.ChatCompletion.create(
|
||||||
|
engine="gpt-35-turbo", #这里填入的不是模型名,是部署名
|
||||||
|
messages=[
|
||||||
|
{"role": "system", "content": "You are a helpful assistant."},
|
||||||
|
{"role": "user", "content": "Does Azure OpenAI support customer managed keys?"},
|
||||||
|
{"role": "assistant", "content": "Yes, customer managed keys are supported by Azure OpenAI."},
|
||||||
|
{"role": "user", "content": "Do other Azure Cognitive Services support this too?"}
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
print(response)
|
||||||
|
print(response['choices'][0]['message']['content'])
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
需要注意的是:
|
||||||
|
|
||||||
|
1. engine那里填入的是部署名,不是模型名
|
||||||
|
|
||||||
|
2. 通过openai库获得的这个 response 和通过 request 库访问 url 获得的 response 不同,不需要 decode,已经是解析好的 json 了,直接根据键值读取即可。
|
||||||
|
|
||||||
|
更细节的使用方法,详见官方API文档。
|
||||||
|
|
||||||
|
# 关于费用
|
||||||
|
|
||||||
|
Azure OpenAI API 还是需要一些费用的(免费订阅只有1个月有效期),费用如下:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
具体可以可以看这个网址 :[Azure OpenAI 服务 - 定价| Microsoft Azure](https://azure.microsoft.com/zh-cn/pricing/details/cognitive-services/openai-service/?cdn=disable)
|
||||||
|
|
||||||
|
并非网上说的什么“一年白嫖”,但注册方法以及网络问题都比直接使用openai的api要简单一些。
|
||||||
4
main.py
4
main.py
@@ -155,7 +155,7 @@ def main():
|
|||||||
for k in crazy_fns:
|
for k in crazy_fns:
|
||||||
if not crazy_fns[k].get("AsButton", True): continue
|
if not crazy_fns[k].get("AsButton", True): continue
|
||||||
click_handle = crazy_fns[k]["Button"].click(ArgsGeneralWrapper(crazy_fns[k]["Function"]), [*input_combo, gr.State(PORT)], output_combo)
|
click_handle = crazy_fns[k]["Button"].click(ArgsGeneralWrapper(crazy_fns[k]["Function"]), [*input_combo, gr.State(PORT)], output_combo)
|
||||||
click_handle.then(on_report_generated, [file_upload, chatbot], [file_upload, chatbot])
|
click_handle.then(on_report_generated, [cookies, file_upload, chatbot], [cookies, file_upload, chatbot])
|
||||||
cancel_handles.append(click_handle)
|
cancel_handles.append(click_handle)
|
||||||
# 函数插件-下拉菜单与随变按钮的互动
|
# 函数插件-下拉菜单与随变按钮的互动
|
||||||
def on_dropdown_changed(k):
|
def on_dropdown_changed(k):
|
||||||
@@ -175,7 +175,7 @@ def main():
|
|||||||
if k in [r"打开插件列表", r"请先从插件列表中选择"]: return
|
if k in [r"打开插件列表", r"请先从插件列表中选择"]: return
|
||||||
yield from ArgsGeneralWrapper(crazy_fns[k]["Function"])(*args, **kwargs)
|
yield from ArgsGeneralWrapper(crazy_fns[k]["Function"])(*args, **kwargs)
|
||||||
click_handle = switchy_bt.click(route,[switchy_bt, *input_combo, gr.State(PORT)], output_combo)
|
click_handle = switchy_bt.click(route,[switchy_bt, *input_combo, gr.State(PORT)], output_combo)
|
||||||
click_handle.then(on_report_generated, [file_upload, chatbot], [file_upload, chatbot])
|
click_handle.then(on_report_generated, [cookies, file_upload, chatbot], [cookies, file_upload, chatbot])
|
||||||
cancel_handles.append(click_handle)
|
cancel_handles.append(click_handle)
|
||||||
# 终止按钮的回调函数注册
|
# 终止按钮的回调函数注册
|
||||||
stopBtn.click(fn=None, inputs=None, outputs=None, cancels=cancel_handles)
|
stopBtn.click(fn=None, inputs=None, outputs=None, cancels=cancel_handles)
|
||||||
|
|||||||
@@ -16,6 +16,9 @@ from toolbox import get_conf, trimmed_format_exc
|
|||||||
from .bridge_chatgpt import predict_no_ui_long_connection as chatgpt_noui
|
from .bridge_chatgpt import predict_no_ui_long_connection as chatgpt_noui
|
||||||
from .bridge_chatgpt import predict as chatgpt_ui
|
from .bridge_chatgpt import predict as chatgpt_ui
|
||||||
|
|
||||||
|
from .bridge_azure_test import predict_no_ui_long_connection as azure_noui
|
||||||
|
from .bridge_azure_test import predict as azure_ui
|
||||||
|
|
||||||
from .bridge_chatglm import predict_no_ui_long_connection as chatglm_noui
|
from .bridge_chatglm import predict_no_ui_long_connection as chatglm_noui
|
||||||
from .bridge_chatglm import predict as chatglm_ui
|
from .bridge_chatglm import predict as chatglm_ui
|
||||||
|
|
||||||
@@ -83,6 +86,33 @@ model_info = {
|
|||||||
"tokenizer": tokenizer_gpt35,
|
"tokenizer": tokenizer_gpt35,
|
||||||
"token_cnt": get_token_num_gpt35,
|
"token_cnt": get_token_num_gpt35,
|
||||||
},
|
},
|
||||||
|
|
||||||
|
"gpt-3.5-turbo-16k": {
|
||||||
|
"fn_with_ui": chatgpt_ui,
|
||||||
|
"fn_without_ui": chatgpt_noui,
|
||||||
|
"endpoint": openai_endpoint,
|
||||||
|
"max_token": 1024*16,
|
||||||
|
"tokenizer": tokenizer_gpt35,
|
||||||
|
"token_cnt": get_token_num_gpt35,
|
||||||
|
},
|
||||||
|
|
||||||
|
"gpt-3.5-turbo-0613": {
|
||||||
|
"fn_with_ui": chatgpt_ui,
|
||||||
|
"fn_without_ui": chatgpt_noui,
|
||||||
|
"endpoint": openai_endpoint,
|
||||||
|
"max_token": 4096,
|
||||||
|
"tokenizer": tokenizer_gpt35,
|
||||||
|
"token_cnt": get_token_num_gpt35,
|
||||||
|
},
|
||||||
|
|
||||||
|
"gpt-3.5-turbo-16k-0613": {
|
||||||
|
"fn_with_ui": chatgpt_ui,
|
||||||
|
"fn_without_ui": chatgpt_noui,
|
||||||
|
"endpoint": openai_endpoint,
|
||||||
|
"max_token": 1024 * 16,
|
||||||
|
"tokenizer": tokenizer_gpt35,
|
||||||
|
"token_cnt": get_token_num_gpt35,
|
||||||
|
},
|
||||||
|
|
||||||
"gpt-4": {
|
"gpt-4": {
|
||||||
"fn_with_ui": chatgpt_ui,
|
"fn_with_ui": chatgpt_ui,
|
||||||
@@ -93,6 +123,16 @@ model_info = {
|
|||||||
"token_cnt": get_token_num_gpt4,
|
"token_cnt": get_token_num_gpt4,
|
||||||
},
|
},
|
||||||
|
|
||||||
|
# azure openai
|
||||||
|
"azure-gpt35":{
|
||||||
|
"fn_with_ui": azure_ui,
|
||||||
|
"fn_without_ui": azure_noui,
|
||||||
|
"endpoint": get_conf("AZURE_ENDPOINT"),
|
||||||
|
"max_token": 4096,
|
||||||
|
"tokenizer": tokenizer_gpt35,
|
||||||
|
"token_cnt": get_token_num_gpt35,
|
||||||
|
},
|
||||||
|
|
||||||
# api_2d
|
# api_2d
|
||||||
"api2d-gpt-3.5-turbo": {
|
"api2d-gpt-3.5-turbo": {
|
||||||
"fn_with_ui": chatgpt_ui,
|
"fn_with_ui": chatgpt_ui,
|
||||||
|
|||||||
241
request_llm/bridge_azure_test.py
普通文件
241
request_llm/bridge_azure_test.py
普通文件
@@ -0,0 +1,241 @@
|
|||||||
|
"""
|
||||||
|
该文件中主要包含三个函数
|
||||||
|
|
||||||
|
不具备多线程能力的函数:
|
||||||
|
1. predict: 正常对话时使用,具备完备的交互功能,不可多线程
|
||||||
|
|
||||||
|
具备多线程调用能力的函数
|
||||||
|
2. predict_no_ui:高级实验性功能模块调用,不会实时显示在界面上,参数简单,可以多线程并行,方便实现复杂的功能逻辑
|
||||||
|
3. predict_no_ui_long_connection:在实验过程中发现调用predict_no_ui处理长文档时,和openai的连接容易断掉,这个函数用stream的方式解决这个问题,同样支持多线程
|
||||||
|
"""
|
||||||
|
|
||||||
|
import logging
|
||||||
|
import traceback
|
||||||
|
import importlib
|
||||||
|
import openai
|
||||||
|
import time
|
||||||
|
|
||||||
|
|
||||||
|
# 读取config.py文件中关于AZURE OPENAI API的信息
|
||||||
|
from toolbox import get_conf, update_ui, clip_history, trimmed_format_exc
|
||||||
|
TIMEOUT_SECONDS, MAX_RETRY, AZURE_ENGINE, AZURE_ENDPOINT, AZURE_API_VERSION, AZURE_API_KEY = \
|
||||||
|
get_conf('TIMEOUT_SECONDS', 'MAX_RETRY',"AZURE_ENGINE","AZURE_ENDPOINT", "AZURE_API_VERSION", "AZURE_API_KEY")
|
||||||
|
|
||||||
|
|
||||||
|
def get_full_error(chunk, stream_response):
|
||||||
|
"""
|
||||||
|
获取完整的从Openai返回的报错
|
||||||
|
"""
|
||||||
|
while True:
|
||||||
|
try:
|
||||||
|
chunk += next(stream_response)
|
||||||
|
except:
|
||||||
|
break
|
||||||
|
return chunk
|
||||||
|
|
||||||
|
def predict(inputs, llm_kwargs, plugin_kwargs, chatbot, history=[], system_prompt='', stream = True, additional_fn=None):
|
||||||
|
"""
|
||||||
|
发送至azure openai api,流式获取输出。
|
||||||
|
用于基础的对话功能。
|
||||||
|
inputs 是本次问询的输入
|
||||||
|
top_p, temperature是chatGPT的内部调优参数
|
||||||
|
history 是之前的对话列表(注意无论是inputs还是history,内容太长了都会触发token数量溢出的错误)
|
||||||
|
chatbot 为WebUI中显示的对话列表,修改它,然后yeild出去,可以直接修改对话界面内容
|
||||||
|
additional_fn代表点击的哪个按钮,按钮见functional.py
|
||||||
|
"""
|
||||||
|
print(llm_kwargs["llm_model"])
|
||||||
|
|
||||||
|
if additional_fn is not None:
|
||||||
|
import core_functional
|
||||||
|
importlib.reload(core_functional) # 热更新prompt
|
||||||
|
core_functional = core_functional.get_core_functions()
|
||||||
|
if "PreProcess" in core_functional[additional_fn]: inputs = core_functional[additional_fn]["PreProcess"](inputs) # 获取预处理函数(如果有的话)
|
||||||
|
inputs = core_functional[additional_fn]["Prefix"] + inputs + core_functional[additional_fn]["Suffix"]
|
||||||
|
|
||||||
|
raw_input = inputs
|
||||||
|
logging.info(f'[raw_input] {raw_input}')
|
||||||
|
chatbot.append((inputs, ""))
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history, msg="等待响应") # 刷新界面
|
||||||
|
|
||||||
|
|
||||||
|
payload = generate_azure_payload(inputs, llm_kwargs, history, system_prompt, stream)
|
||||||
|
|
||||||
|
history.append(inputs); history.append("")
|
||||||
|
|
||||||
|
retry = 0
|
||||||
|
while True:
|
||||||
|
try:
|
||||||
|
|
||||||
|
openai.api_type = "azure"
|
||||||
|
openai.api_version = AZURE_API_VERSION
|
||||||
|
openai.api_base = AZURE_ENDPOINT
|
||||||
|
openai.api_key = AZURE_API_KEY
|
||||||
|
response = openai.ChatCompletion.create(timeout=TIMEOUT_SECONDS, **payload);break
|
||||||
|
|
||||||
|
except:
|
||||||
|
retry += 1
|
||||||
|
chatbot[-1] = ((chatbot[-1][0], "获取response失败,重试中。。。"))
|
||||||
|
retry_msg = f",正在重试 ({retry}/{MAX_RETRY}) ……" if MAX_RETRY > 0 else ""
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history, msg="请求超时"+retry_msg) # 刷新界面
|
||||||
|
if retry > MAX_RETRY: raise TimeoutError
|
||||||
|
|
||||||
|
gpt_replying_buffer = ""
|
||||||
|
is_head_of_the_stream = True
|
||||||
|
if stream:
|
||||||
|
|
||||||
|
stream_response = response
|
||||||
|
|
||||||
|
while True:
|
||||||
|
try:
|
||||||
|
chunk = next(stream_response)
|
||||||
|
|
||||||
|
except StopIteration:
|
||||||
|
from toolbox import regular_txt_to_markdown; tb_str = '```\n' + trimmed_format_exc() + '```'
|
||||||
|
chatbot[-1] = (chatbot[-1][0], f"[Local Message] 远程返回错误: \n\n{tb_str} \n\n{regular_txt_to_markdown(chunk)}")
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history, msg="远程返回错误:" + chunk) # 刷新界面
|
||||||
|
return
|
||||||
|
|
||||||
|
if is_head_of_the_stream and (r'"object":"error"' not in chunk):
|
||||||
|
# 数据流的第一帧不携带content
|
||||||
|
is_head_of_the_stream = False; continue
|
||||||
|
|
||||||
|
if chunk:
|
||||||
|
#print(chunk)
|
||||||
|
try:
|
||||||
|
if "delta" in chunk["choices"][0]:
|
||||||
|
if chunk["choices"][0]["finish_reason"] == "stop":
|
||||||
|
logging.info(f'[response] {gpt_replying_buffer}')
|
||||||
|
break
|
||||||
|
status_text = f"finish_reason: {chunk['choices'][0]['finish_reason']}"
|
||||||
|
gpt_replying_buffer = gpt_replying_buffer + chunk["choices"][0]["delta"]["content"]
|
||||||
|
|
||||||
|
history[-1] = gpt_replying_buffer
|
||||||
|
chatbot[-1] = (history[-2], history[-1])
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history, msg=status_text) # 刷新界面
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
traceback.print_exc()
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history, msg="Json解析不合常规") # 刷新界面
|
||||||
|
chunk = get_full_error(chunk, stream_response)
|
||||||
|
|
||||||
|
error_msg = chunk
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history, msg="Json异常" + error_msg) # 刷新界面
|
||||||
|
return
|
||||||
|
|
||||||
|
|
||||||
|
def predict_no_ui_long_connection(inputs, llm_kwargs, history=[], sys_prompt="", observe_window=None, console_slience=False):
|
||||||
|
"""
|
||||||
|
发送至AZURE OPENAI API,等待回复,一次性完成,不显示中间过程。但内部用stream的方法避免中途网线被掐。
|
||||||
|
inputs:
|
||||||
|
是本次问询的输入
|
||||||
|
sys_prompt:
|
||||||
|
系统静默prompt
|
||||||
|
llm_kwargs:
|
||||||
|
chatGPT的内部调优参数
|
||||||
|
history:
|
||||||
|
是之前的对话列表
|
||||||
|
observe_window = None:
|
||||||
|
用于负责跨越线程传递已经输出的部分,大部分时候仅仅为了fancy的视觉效果,留空即可。observe_window[0]:观测窗。observe_window[1]:看门狗
|
||||||
|
"""
|
||||||
|
watch_dog_patience = 5 # 看门狗的耐心, 设置5秒即可
|
||||||
|
payload = generate_azure_payload(inputs, llm_kwargs, history, system_prompt=sys_prompt, stream=True)
|
||||||
|
retry = 0
|
||||||
|
while True:
|
||||||
|
|
||||||
|
try:
|
||||||
|
openai.api_type = "azure"
|
||||||
|
openai.api_version = AZURE_API_VERSION
|
||||||
|
openai.api_base = AZURE_ENDPOINT
|
||||||
|
openai.api_key = AZURE_API_KEY
|
||||||
|
response = openai.ChatCompletion.create(timeout=TIMEOUT_SECONDS, **payload);break
|
||||||
|
|
||||||
|
except:
|
||||||
|
retry += 1
|
||||||
|
traceback.print_exc()
|
||||||
|
if retry > MAX_RETRY: raise TimeoutError
|
||||||
|
if MAX_RETRY!=0: print(f'请求超时,正在重试 ({retry}/{MAX_RETRY}) ……')
|
||||||
|
|
||||||
|
|
||||||
|
stream_response = response
|
||||||
|
result = ''
|
||||||
|
while True:
|
||||||
|
try: chunk = next(stream_response)
|
||||||
|
except StopIteration:
|
||||||
|
break
|
||||||
|
except:
|
||||||
|
chunk = next(stream_response) # 失败了,重试一次?再失败就没办法了。
|
||||||
|
|
||||||
|
if len(chunk)==0: continue
|
||||||
|
if not chunk.startswith('data:'):
|
||||||
|
error_msg = get_full_error(chunk, stream_response)
|
||||||
|
if "reduce the length" in error_msg:
|
||||||
|
raise ConnectionAbortedError("AZURE OPENAI API拒绝了请求:" + error_msg)
|
||||||
|
else:
|
||||||
|
raise RuntimeError("AZURE OPENAI API拒绝了请求:" + error_msg)
|
||||||
|
if ('data: [DONE]' in chunk): break
|
||||||
|
|
||||||
|
delta = chunk["delta"]
|
||||||
|
if len(delta) == 0: break
|
||||||
|
if "role" in delta: continue
|
||||||
|
if "content" in delta:
|
||||||
|
result += delta["content"]
|
||||||
|
if not console_slience: print(delta["content"], end='')
|
||||||
|
if observe_window is not None:
|
||||||
|
# 观测窗,把已经获取的数据显示出去
|
||||||
|
if len(observe_window) >= 1: observe_window[0] += delta["content"]
|
||||||
|
# 看门狗,如果超过期限没有喂狗,则终止
|
||||||
|
if len(observe_window) >= 2:
|
||||||
|
if (time.time()-observe_window[1]) > watch_dog_patience:
|
||||||
|
raise RuntimeError("用户取消了程序。")
|
||||||
|
else: raise RuntimeError("意外Json结构:"+delta)
|
||||||
|
if chunk['finish_reason'] == 'length':
|
||||||
|
raise ConnectionAbortedError("正常结束,但显示Token不足,导致输出不完整,请削减单次输入的文本量。")
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
def generate_azure_payload(inputs, llm_kwargs, history, system_prompt, stream):
|
||||||
|
"""
|
||||||
|
整合所有信息,选择LLM模型,生成 azure openai api请求,为发送请求做准备
|
||||||
|
"""
|
||||||
|
|
||||||
|
conversation_cnt = len(history) // 2
|
||||||
|
|
||||||
|
messages = [{"role": "system", "content": system_prompt}]
|
||||||
|
if conversation_cnt:
|
||||||
|
for index in range(0, 2*conversation_cnt, 2):
|
||||||
|
what_i_have_asked = {}
|
||||||
|
what_i_have_asked["role"] = "user"
|
||||||
|
what_i_have_asked["content"] = history[index]
|
||||||
|
what_gpt_answer = {}
|
||||||
|
what_gpt_answer["role"] = "assistant"
|
||||||
|
what_gpt_answer["content"] = history[index+1]
|
||||||
|
if what_i_have_asked["content"] != "":
|
||||||
|
if what_gpt_answer["content"] == "": continue
|
||||||
|
messages.append(what_i_have_asked)
|
||||||
|
messages.append(what_gpt_answer)
|
||||||
|
else:
|
||||||
|
messages[-1]['content'] = what_gpt_answer['content']
|
||||||
|
|
||||||
|
what_i_ask_now = {}
|
||||||
|
what_i_ask_now["role"] = "user"
|
||||||
|
what_i_ask_now["content"] = inputs
|
||||||
|
messages.append(what_i_ask_now)
|
||||||
|
|
||||||
|
payload = {
|
||||||
|
"model": llm_kwargs['llm_model'],
|
||||||
|
"messages": messages,
|
||||||
|
"temperature": llm_kwargs['temperature'], # 1.0,
|
||||||
|
"top_p": llm_kwargs['top_p'], # 1.0,
|
||||||
|
"n": 1,
|
||||||
|
"stream": stream,
|
||||||
|
"presence_penalty": 0,
|
||||||
|
"frequency_penalty": 0,
|
||||||
|
"engine": AZURE_ENGINE
|
||||||
|
}
|
||||||
|
try:
|
||||||
|
print(f" {llm_kwargs['llm_model']} : {conversation_cnt} : {inputs[:100]} ..........")
|
||||||
|
except:
|
||||||
|
print('输入中可能存在乱码。')
|
||||||
|
return payload
|
||||||
|
|
||||||
|
|
||||||
48
toolbox.py
48
toolbox.py
@@ -6,6 +6,7 @@ import re
|
|||||||
import os
|
import os
|
||||||
from latex2mathml.converter import convert as tex2mathml
|
from latex2mathml.converter import convert as tex2mathml
|
||||||
from functools import wraps, lru_cache
|
from functools import wraps, lru_cache
|
||||||
|
pj = os.path.join
|
||||||
|
|
||||||
"""
|
"""
|
||||||
========================================================================
|
========================================================================
|
||||||
@@ -221,16 +222,21 @@ def text_divide_paragraph(text):
|
|||||||
"""
|
"""
|
||||||
将文本按照段落分隔符分割开,生成带有段落标签的HTML代码。
|
将文本按照段落分隔符分割开,生成带有段落标签的HTML代码。
|
||||||
"""
|
"""
|
||||||
|
pre = '<div class="markdown-body">'
|
||||||
|
suf = '</div>'
|
||||||
|
if text.startswith(pre) and text.endswith(suf):
|
||||||
|
return text
|
||||||
|
|
||||||
if '```' in text:
|
if '```' in text:
|
||||||
# careful input
|
# careful input
|
||||||
return text
|
return pre + text + suf
|
||||||
else:
|
else:
|
||||||
# wtf input
|
# wtf input
|
||||||
lines = text.split("\n")
|
lines = text.split("\n")
|
||||||
for i, line in enumerate(lines):
|
for i, line in enumerate(lines):
|
||||||
lines[i] = lines[i].replace(" ", " ")
|
lines[i] = lines[i].replace(" ", " ")
|
||||||
text = "</br>".join(lines)
|
text = "</br>".join(lines)
|
||||||
return text
|
return pre + text + suf
|
||||||
|
|
||||||
@lru_cache(maxsize=128) # 使用 lru缓存 加快转换速度
|
@lru_cache(maxsize=128) # 使用 lru缓存 加快转换速度
|
||||||
def markdown_convertion(txt):
|
def markdown_convertion(txt):
|
||||||
@@ -342,8 +348,11 @@ def format_io(self, y):
|
|||||||
if y is None or y == []:
|
if y is None or y == []:
|
||||||
return []
|
return []
|
||||||
i_ask, gpt_reply = y[-1]
|
i_ask, gpt_reply = y[-1]
|
||||||
i_ask = text_divide_paragraph(i_ask) # 输入部分太自由,预处理一波
|
# 输入部分太自由,预处理一波
|
||||||
gpt_reply = close_up_code_segment_during_stream(gpt_reply) # 当代码输出半截的时候,试着补上后个```
|
if i_ask is not None: i_ask = text_divide_paragraph(i_ask)
|
||||||
|
# 当代码输出半截的时候,试着补上后个```
|
||||||
|
if gpt_reply is not None: gpt_reply = close_up_code_segment_during_stream(gpt_reply)
|
||||||
|
# process
|
||||||
y[-1] = (
|
y[-1] = (
|
||||||
None if i_ask is None else markdown.markdown(i_ask, extensions=['fenced_code', 'tables']),
|
None if i_ask is None else markdown.markdown(i_ask, extensions=['fenced_code', 'tables']),
|
||||||
None if gpt_reply is None else markdown_convertion(gpt_reply)
|
None if gpt_reply is None else markdown_convertion(gpt_reply)
|
||||||
@@ -391,7 +400,7 @@ def extract_archive(file_path, dest_dir):
|
|||||||
print("Successfully extracted rar archive to {}".format(dest_dir))
|
print("Successfully extracted rar archive to {}".format(dest_dir))
|
||||||
except:
|
except:
|
||||||
print("Rar format requires additional dependencies to install")
|
print("Rar format requires additional dependencies to install")
|
||||||
return '\n\n需要安装pip install rarfile来解压rar文件'
|
return '\n\n解压失败! 需要安装pip install rarfile来解压rar文件'
|
||||||
|
|
||||||
# 第三方库,需要预先pip install py7zr
|
# 第三方库,需要预先pip install py7zr
|
||||||
elif file_extension == '.7z':
|
elif file_extension == '.7z':
|
||||||
@@ -402,7 +411,7 @@ def extract_archive(file_path, dest_dir):
|
|||||||
print("Successfully extracted 7z archive to {}".format(dest_dir))
|
print("Successfully extracted 7z archive to {}".format(dest_dir))
|
||||||
except:
|
except:
|
||||||
print("7z format requires additional dependencies to install")
|
print("7z format requires additional dependencies to install")
|
||||||
return '\n\n需要安装pip install py7zr来解压7z文件'
|
return '\n\n解压失败! 需要安装pip install py7zr来解压7z文件'
|
||||||
else:
|
else:
|
||||||
return ''
|
return ''
|
||||||
return ''
|
return ''
|
||||||
@@ -431,13 +440,17 @@ def find_recent_files(directory):
|
|||||||
|
|
||||||
return recent_files
|
return recent_files
|
||||||
|
|
||||||
def promote_file_to_downloadzone(file, rename_file=None):
|
def promote_file_to_downloadzone(file, rename_file=None, chatbot=None):
|
||||||
# 将文件复制一份到下载区
|
# 将文件复制一份到下载区
|
||||||
import shutil
|
import shutil
|
||||||
if rename_file is None: rename_file = f'{gen_time_str()}-{os.path.basename(file)}'
|
if rename_file is None: rename_file = f'{gen_time_str()}-{os.path.basename(file)}'
|
||||||
new_path = os.path.join(f'./gpt_log/', rename_file)
|
new_path = os.path.join(f'./gpt_log/', rename_file)
|
||||||
if os.path.exists(new_path): os.remove(new_path)
|
if os.path.exists(new_path) and not os.path.samefile(new_path, file): os.remove(new_path)
|
||||||
shutil.copyfile(file, new_path)
|
if not os.path.exists(new_path): shutil.copyfile(file, new_path)
|
||||||
|
if chatbot:
|
||||||
|
if 'file_to_promote' in chatbot._cookies: current = chatbot._cookies['file_to_promote']
|
||||||
|
else: current = []
|
||||||
|
chatbot._cookies.update({'file_to_promote': [new_path] + current})
|
||||||
|
|
||||||
def on_file_uploaded(files, chatbot, txt, txt2, checkboxes):
|
def on_file_uploaded(files, chatbot, txt, txt2, checkboxes):
|
||||||
"""
|
"""
|
||||||
@@ -477,14 +490,20 @@ def on_file_uploaded(files, chatbot, txt, txt2, checkboxes):
|
|||||||
return chatbot, txt, txt2
|
return chatbot, txt, txt2
|
||||||
|
|
||||||
|
|
||||||
def on_report_generated(files, chatbot):
|
def on_report_generated(cookies, files, chatbot):
|
||||||
from toolbox import find_recent_files
|
from toolbox import find_recent_files
|
||||||
report_files = find_recent_files('gpt_log')
|
if 'file_to_promote' in cookies:
|
||||||
|
report_files = cookies['file_to_promote']
|
||||||
|
cookies.pop('file_to_promote')
|
||||||
|
else:
|
||||||
|
report_files = find_recent_files('gpt_log')
|
||||||
if len(report_files) == 0:
|
if len(report_files) == 0:
|
||||||
return None, chatbot
|
return None, chatbot
|
||||||
# files.extend(report_files)
|
# files.extend(report_files)
|
||||||
chatbot.append(['报告如何远程获取?', '报告已经添加到右侧“文件上传区”(可能处于折叠状态),请查收。'])
|
file_links = ''
|
||||||
return report_files, chatbot
|
for f in report_files: file_links += f'<br/><a href="file={os.path.abspath(f)}" target="_blank">{f}</a>'
|
||||||
|
chatbot.append(['报告如何远程获取?', f'报告已经添加到右侧“文件上传区”(可能处于折叠状态),请查收。{file_links}'])
|
||||||
|
return cookies, report_files, chatbot
|
||||||
|
|
||||||
def is_openai_api_key(key):
|
def is_openai_api_key(key):
|
||||||
API_MATCH_ORIGINAL = re.match(r"sk-[a-zA-Z0-9]{48}$", key)
|
API_MATCH_ORIGINAL = re.match(r"sk-[a-zA-Z0-9]{48}$", key)
|
||||||
@@ -786,7 +805,8 @@ def zip_result(folder):
|
|||||||
import time
|
import time
|
||||||
t = time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime())
|
t = time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime())
|
||||||
zip_folder(folder, './gpt_log/', f'{t}-result.zip')
|
zip_folder(folder, './gpt_log/', f'{t}-result.zip')
|
||||||
|
return pj('./gpt_log/', f'{t}-result.zip')
|
||||||
|
|
||||||
def gen_time_str():
|
def gen_time_str():
|
||||||
import time
|
import time
|
||||||
return time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime())
|
return time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime())
|
||||||
|
|||||||
4
version
4
version
@@ -1,5 +1,5 @@
|
|||||||
{
|
{
|
||||||
"version": 3.4,
|
"version": 3.42,
|
||||||
"show_feature": true,
|
"show_feature": true,
|
||||||
"new_feature": "新增最强Arxiv论文翻译插件 <-> 修复gradio复制按钮BUG <-> 修复PDF翻译的BUG, 新增HTML中英双栏对照 <-> 添加了OpenAI图片生成插件 <-> 添加了OpenAI音频转文本总结插件 <-> 通过Slack添加对Claude的支持"
|
"new_feature": "完善本地Latex矫错和翻译功能 <-> 增加gpt-3.5-16k的支持 <-> 新增最强Arxiv论文翻译插件 <-> 修复gradio复制按钮BUG <-> 修复PDF翻译的BUG, 新增HTML中英双栏对照 <-> 添加了OpenAI图片生成插件 <-> 添加了OpenAI音频转文本总结插件 <-> 通过Slack添加对Claude的支持"
|
||||||
}
|
}
|
||||||
|
|||||||
在新工单中引用
屏蔽一个用户