镜像自地址
https://github.com/binary-husky/gpt_academic.git
已同步 2025-12-06 22:46:48 +00:00
比较提交
40 次代码提交
version3.5
...
version3.5
| 作者 | SHA1 | 提交日期 | |
|---|---|---|---|
|
|
34784333dc | ||
|
|
28d777a96b | ||
|
|
c45fa88684 | ||
|
|
ad9807dd14 | ||
|
|
2a51715075 | ||
|
|
7c307d8964 | ||
|
|
baaacc5a7b | ||
|
|
6faf5947c9 | ||
|
|
571335cbc4 | ||
|
|
7d5abb6d69 | ||
|
|
a0f592308a | ||
|
|
e512d99879 | ||
|
|
e70b636513 | ||
|
|
408b8403fe | ||
|
|
74f8cb3511 | ||
|
|
2202cf3701 | ||
|
|
cce69beee9 | ||
|
|
347124c967 | ||
|
|
77a6105a9a | ||
|
|
13c9606af7 | ||
|
|
bac6810e75 | ||
|
|
c176187d24 | ||
|
|
31d5ee6ccc | ||
|
|
5e0dc9b9ad | ||
|
|
4c6f3aa427 | ||
|
|
d7331befc1 | ||
|
|
63219baa21 | ||
|
|
97cb9a4adc | ||
|
|
24f41b0a75 | ||
|
|
bfec29e9bc | ||
|
|
dd9e624761 | ||
|
|
7855325ff9 | ||
|
|
2c039ff5c9 | ||
|
|
9a5ee86434 | ||
|
|
d6698db257 | ||
|
|
b2d03bf2a3 | ||
|
|
d183e34461 | ||
|
|
fb78569335 | ||
|
|
03164bcb6f | ||
|
|
d052d425af |
44
.github/workflows/build-with-all-capacity.yml
vendored
普通文件
44
.github/workflows/build-with-all-capacity.yml
vendored
普通文件
@@ -0,0 +1,44 @@
|
|||||||
|
# https://docs.github.com/en/actions/publishing-packages/publishing-docker-images#publishing-images-to-github-packages
|
||||||
|
name: build-with-all-capacity
|
||||||
|
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
branches:
|
||||||
|
- 'master'
|
||||||
|
|
||||||
|
env:
|
||||||
|
REGISTRY: ghcr.io
|
||||||
|
IMAGE_NAME: ${{ github.repository }}_with_all_capacity
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
build-and-push-image:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
permissions:
|
||||||
|
contents: read
|
||||||
|
packages: write
|
||||||
|
|
||||||
|
steps:
|
||||||
|
- name: Checkout repository
|
||||||
|
uses: actions/checkout@v3
|
||||||
|
|
||||||
|
- name: Log in to the Container registry
|
||||||
|
uses: docker/login-action@v2
|
||||||
|
with:
|
||||||
|
registry: ${{ env.REGISTRY }}
|
||||||
|
username: ${{ github.actor }}
|
||||||
|
password: ${{ secrets.GITHUB_TOKEN }}
|
||||||
|
|
||||||
|
- name: Extract metadata (tags, labels) for Docker
|
||||||
|
id: meta
|
||||||
|
uses: docker/metadata-action@v4
|
||||||
|
with:
|
||||||
|
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
|
||||||
|
|
||||||
|
- name: Build and push Docker image
|
||||||
|
uses: docker/build-push-action@v4
|
||||||
|
with:
|
||||||
|
context: .
|
||||||
|
push: true
|
||||||
|
file: docs/GithubAction+AllCapacity
|
||||||
|
tags: ${{ steps.meta.outputs.tags }}
|
||||||
|
labels: ${{ steps.meta.outputs.labels }}
|
||||||
@@ -10,7 +10,7 @@
|
|||||||
**如果喜欢这个项目,请给它一个Star;如果您发明了好用的快捷键或函数插件,欢迎发pull requests!**
|
**如果喜欢这个项目,请给它一个Star;如果您发明了好用的快捷键或函数插件,欢迎发pull requests!**
|
||||||
|
|
||||||
If you like this project, please give it a Star. If you've come up with more useful academic shortcuts or functional plugins, feel free to open an issue or pull request. We also have a README in [English|](docs/README_EN.md)[日本語|](docs/README_JP.md)[한국어|](https://github.com/mldljyh/ko_gpt_academic)[Русский|](docs/README_RS.md)[Français](docs/README_FR.md) translated by this project itself.
|
If you like this project, please give it a Star. If you've come up with more useful academic shortcuts or functional plugins, feel free to open an issue or pull request. We also have a README in [English|](docs/README_EN.md)[日本語|](docs/README_JP.md)[한국어|](https://github.com/mldljyh/ko_gpt_academic)[Русский|](docs/README_RS.md)[Français](docs/README_FR.md) translated by this project itself.
|
||||||
To translate this project to arbitary language with GPT, read and run [`multi_language.py`](multi_language.py) (experimental).
|
To translate this project to arbitrary language with GPT, read and run [`multi_language.py`](multi_language.py) (experimental).
|
||||||
|
|
||||||
> **Note**
|
> **Note**
|
||||||
>
|
>
|
||||||
@@ -54,7 +54,7 @@ Latex论文一键校对 | [函数插件] 仿Grammarly对Latex文章进行语法
|
|||||||
⭐ChatGLM2微调模型 | 支持加载ChatGLM2微调模型,提供ChatGLM2微调辅助插件
|
⭐ChatGLM2微调模型 | 支持加载ChatGLM2微调模型,提供ChatGLM2微调辅助插件
|
||||||
更多LLM模型接入,支持[huggingface部署](https://huggingface.co/spaces/qingxu98/gpt-academic) | 加入Newbing接口(新必应),引入清华[Jittorllms](https://github.com/Jittor/JittorLLMs)支持[LLaMA](https://github.com/facebookresearch/llama)和[盘古α](https://openi.org.cn/pangu/)
|
更多LLM模型接入,支持[huggingface部署](https://huggingface.co/spaces/qingxu98/gpt-academic) | 加入Newbing接口(新必应),引入清华[Jittorllms](https://github.com/Jittor/JittorLLMs)支持[LLaMA](https://github.com/facebookresearch/llama)和[盘古α](https://openi.org.cn/pangu/)
|
||||||
⭐[void-terminal](https://github.com/binary-husky/void-terminal) pip包 | 脱离GUI,在Python中直接调用本项目的所有函数插件(开发中)
|
⭐[void-terminal](https://github.com/binary-husky/void-terminal) pip包 | 脱离GUI,在Python中直接调用本项目的所有函数插件(开发中)
|
||||||
⭐虚空终端插件 | 用自然语言,直接调度本项目其他插件
|
⭐虚空终端插件 | [函数插件] 用自然语言,直接调度本项目其他插件
|
||||||
更多新功能展示 (图像生成等) …… | 见本文档结尾处 ……
|
更多新功能展示 (图像生成等) …… | 见本文档结尾处 ……
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
@@ -149,11 +149,14 @@ python main.py
|
|||||||
|
|
||||||
### 安装方法II:使用Docker
|
### 安装方法II:使用Docker
|
||||||
|
|
||||||
|
[](https://github.com/binary-husky/gpt_academic/actions/workflows/build-with-audio-assistant.yml)
|
||||||
|
|
||||||
1. 仅ChatGPT(推荐大多数人选择,等价于docker-compose方案1)
|
1. 仅ChatGPT(推荐大多数人选择,等价于docker-compose方案1)
|
||||||
[](https://github.com/binary-husky/gpt_academic/actions/workflows/build-without-local-llms.yml)
|
[](https://github.com/binary-husky/gpt_academic/actions/workflows/build-without-local-llms.yml)
|
||||||
[](https://github.com/binary-husky/gpt_academic/actions/workflows/build-with-latex.yml)
|
[](https://github.com/binary-husky/gpt_academic/actions/workflows/build-with-latex.yml)
|
||||||
[](https://github.com/binary-husky/gpt_academic/actions/workflows/build-with-audio-assistant.yml)
|
[](https://github.com/binary-husky/gpt_academic/actions/workflows/build-with-audio-assistant.yml)
|
||||||
|
|
||||||
|
|
||||||
``` sh
|
``` sh
|
||||||
git clone --depth=1 https://github.com/binary-husky/gpt_academic.git # 下载项目
|
git clone --depth=1 https://github.com/binary-husky/gpt_academic.git # 下载项目
|
||||||
cd gpt_academic # 进入路径
|
cd gpt_academic # 进入路径
|
||||||
@@ -252,7 +255,7 @@ Tip:不指定文件直接点击 `载入对话历史存档` 可以查看历史h
|
|||||||
|
|
||||||
3. 虚空终端(从自然语言输入中,理解用户意图+自动调用其他插件)
|
3. 虚空终端(从自然语言输入中,理解用户意图+自动调用其他插件)
|
||||||
|
|
||||||
- 步骤一:输入 “ 请调用插件翻译PDF论文,地址为https://www.nature.com/articles/s41586-019-1724-z.pdf ”
|
- 步骤一:输入 “ 请调用插件翻译PDF论文,地址为https://openreview.net/pdf?id=rJl0r3R9KX ”
|
||||||
- 步骤二:点击“虚空终端”
|
- 步骤二:点击“虚空终端”
|
||||||
|
|
||||||
<div align="center">
|
<div align="center">
|
||||||
|
|||||||
@@ -5,7 +5,7 @@ def check_proxy(proxies):
|
|||||||
try:
|
try:
|
||||||
response = requests.get("https://ipapi.co/json/", proxies=proxies, timeout=4)
|
response = requests.get("https://ipapi.co/json/", proxies=proxies, timeout=4)
|
||||||
data = response.json()
|
data = response.json()
|
||||||
print(f'查询代理的地理位置,返回的结果是{data}')
|
# print(f'查询代理的地理位置,返回的结果是{data}')
|
||||||
if 'country_name' in data:
|
if 'country_name' in data:
|
||||||
country = data['country_name']
|
country = data['country_name']
|
||||||
result = f"代理配置 {proxies_https}, 代理所在地:{country}"
|
result = f"代理配置 {proxies_https}, 代理所在地:{country}"
|
||||||
|
|||||||
@@ -501,6 +501,32 @@ def get_crazy_functions():
|
|||||||
except:
|
except:
|
||||||
print('Load function plugin failed')
|
print('Load function plugin failed')
|
||||||
|
|
||||||
|
try:
|
||||||
|
from crazy_functions.批量翻译PDF文档_NOUGAT import 批量翻译PDF文档
|
||||||
|
function_plugins.update({
|
||||||
|
"精准翻译PDF文档(NOUGAT)": {
|
||||||
|
"Group": "学术",
|
||||||
|
"Color": "stop",
|
||||||
|
"AsButton": False,
|
||||||
|
"Function": HotReload(批量翻译PDF文档)
|
||||||
|
}
|
||||||
|
})
|
||||||
|
except:
|
||||||
|
print('Load function plugin failed')
|
||||||
|
|
||||||
|
|
||||||
|
# try:
|
||||||
|
# from crazy_functions.CodeInterpreter import 虚空终端CodeInterpreter
|
||||||
|
# function_plugins.update({
|
||||||
|
# "CodeInterpreter(开发中,仅供测试)": {
|
||||||
|
# "Group": "编程|对话",
|
||||||
|
# "Color": "stop",
|
||||||
|
# "AsButton": False,
|
||||||
|
# "Function": HotReload(虚空终端CodeInterpreter)
|
||||||
|
# }
|
||||||
|
# })
|
||||||
|
# except:
|
||||||
|
# print('Load function plugin failed')
|
||||||
|
|
||||||
# try:
|
# try:
|
||||||
# from crazy_functions.chatglm微调工具 import 微调数据集生成
|
# from crazy_functions.chatglm微调工具 import 微调数据集生成
|
||||||
|
|||||||
231
crazy_functions/CodeInterpreter.py
普通文件
231
crazy_functions/CodeInterpreter.py
普通文件
@@ -0,0 +1,231 @@
|
|||||||
|
from collections.abc import Callable, Iterable, Mapping
|
||||||
|
from typing import Any
|
||||||
|
from toolbox import CatchException, update_ui, gen_time_str, trimmed_format_exc, promote_file_to_downloadzone, clear_file_downloadzone
|
||||||
|
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
|
||||||
|
from .crazy_utils import input_clipping, try_install_deps
|
||||||
|
from multiprocessing import Process, Pipe
|
||||||
|
import os
|
||||||
|
import time
|
||||||
|
|
||||||
|
templete = """
|
||||||
|
```python
|
||||||
|
import ... # Put dependencies here, e.g. import numpy as np
|
||||||
|
|
||||||
|
class TerminalFunction(object): # Do not change the name of the class, The name of the class must be `TerminalFunction`
|
||||||
|
|
||||||
|
def run(self, path): # The name of the function must be `run`, it takes only a positional argument.
|
||||||
|
# rewrite the function you have just written here
|
||||||
|
...
|
||||||
|
return generated_file_path
|
||||||
|
```
|
||||||
|
"""
|
||||||
|
|
||||||
|
def inspect_dependency(chatbot, history):
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
||||||
|
return True
|
||||||
|
|
||||||
|
def get_code_block(reply):
|
||||||
|
import re
|
||||||
|
pattern = r"```([\s\S]*?)```" # regex pattern to match code blocks
|
||||||
|
matches = re.findall(pattern, reply) # find all code blocks in text
|
||||||
|
if len(matches) == 1:
|
||||||
|
return matches[0].strip('python') # code block
|
||||||
|
for match in matches:
|
||||||
|
if 'class TerminalFunction' in match:
|
||||||
|
return match.strip('python') # code block
|
||||||
|
raise RuntimeError("GPT is not generating proper code.")
|
||||||
|
|
||||||
|
def gpt_interact_multi_step(txt, file_type, llm_kwargs, chatbot, history):
|
||||||
|
# 输入
|
||||||
|
prompt_compose = [
|
||||||
|
f'Your job:\n'
|
||||||
|
f'1. write a single Python function, which takes a path of a `{file_type}` file as the only argument and returns a `string` containing the result of analysis or the path of generated files. \n',
|
||||||
|
f"2. You should write this function to perform following task: " + txt + "\n",
|
||||||
|
f"3. Wrap the output python function with markdown codeblock."
|
||||||
|
]
|
||||||
|
i_say = "".join(prompt_compose)
|
||||||
|
demo = []
|
||||||
|
|
||||||
|
# 第一步
|
||||||
|
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
|
||||||
|
inputs=i_say, inputs_show_user=i_say,
|
||||||
|
llm_kwargs=llm_kwargs, chatbot=chatbot, history=demo,
|
||||||
|
sys_prompt= r"You are a programmer."
|
||||||
|
)
|
||||||
|
history.extend([i_say, gpt_say])
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 界面更新
|
||||||
|
|
||||||
|
# 第二步
|
||||||
|
prompt_compose = [
|
||||||
|
"If previous stage is successful, rewrite the function you have just written to satisfy following templete: \n",
|
||||||
|
templete
|
||||||
|
]
|
||||||
|
i_say = "".join(prompt_compose); inputs_show_user = "If previous stage is successful, rewrite the function you have just written to satisfy executable templete. "
|
||||||
|
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
|
||||||
|
inputs=i_say, inputs_show_user=inputs_show_user,
|
||||||
|
llm_kwargs=llm_kwargs, chatbot=chatbot, history=history,
|
||||||
|
sys_prompt= r"You are a programmer."
|
||||||
|
)
|
||||||
|
code_to_return = gpt_say
|
||||||
|
history.extend([i_say, gpt_say])
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 界面更新
|
||||||
|
|
||||||
|
# # 第三步
|
||||||
|
# i_say = "Please list to packages to install to run the code above. Then show me how to use `try_install_deps` function to install them."
|
||||||
|
# i_say += 'For instance. `try_install_deps(["opencv-python", "scipy", "numpy"])`'
|
||||||
|
# installation_advance = yield from request_gpt_model_in_new_thread_with_ui_alive(
|
||||||
|
# inputs=i_say, inputs_show_user=inputs_show_user,
|
||||||
|
# llm_kwargs=llm_kwargs, chatbot=chatbot, history=history,
|
||||||
|
# sys_prompt= r"You are a programmer."
|
||||||
|
# )
|
||||||
|
# # # 第三步
|
||||||
|
# i_say = "Show me how to use `pip` to install packages to run the code above. "
|
||||||
|
# i_say += 'For instance. `pip install -r opencv-python scipy numpy`'
|
||||||
|
# installation_advance = yield from request_gpt_model_in_new_thread_with_ui_alive(
|
||||||
|
# inputs=i_say, inputs_show_user=i_say,
|
||||||
|
# llm_kwargs=llm_kwargs, chatbot=chatbot, history=history,
|
||||||
|
# sys_prompt= r"You are a programmer."
|
||||||
|
# )
|
||||||
|
installation_advance = ""
|
||||||
|
|
||||||
|
return code_to_return, installation_advance, txt, file_type, llm_kwargs, chatbot, history
|
||||||
|
|
||||||
|
def make_module(code):
|
||||||
|
module_file = 'gpt_fn_' + gen_time_str().replace('-','_')
|
||||||
|
with open(f'gpt_log/{module_file}.py', 'w', encoding='utf8') as f:
|
||||||
|
f.write(code)
|
||||||
|
|
||||||
|
def get_class_name(class_string):
|
||||||
|
import re
|
||||||
|
# Use regex to extract the class name
|
||||||
|
class_name = re.search(r'class (\w+)\(', class_string).group(1)
|
||||||
|
return class_name
|
||||||
|
|
||||||
|
class_name = get_class_name(code)
|
||||||
|
return f"gpt_log.{module_file}->{class_name}"
|
||||||
|
|
||||||
|
def init_module_instance(module):
|
||||||
|
import importlib
|
||||||
|
module_, class_ = module.split('->')
|
||||||
|
init_f = getattr(importlib.import_module(module_), class_)
|
||||||
|
return init_f()
|
||||||
|
|
||||||
|
def for_immediate_show_off_when_possible(file_type, fp, chatbot):
|
||||||
|
if file_type in ['png', 'jpg']:
|
||||||
|
image_path = os.path.abspath(fp)
|
||||||
|
chatbot.append(['这是一张图片, 展示如下:',
|
||||||
|
f'本地文件地址: <br/>`{image_path}`<br/>'+
|
||||||
|
f'本地文件预览: <br/><div align="center"><img src="file={image_path}"></div>'
|
||||||
|
])
|
||||||
|
return chatbot
|
||||||
|
|
||||||
|
def subprocess_worker(instance, file_path, return_dict):
|
||||||
|
return_dict['result'] = instance.run(file_path)
|
||||||
|
|
||||||
|
def have_any_recent_upload_files(chatbot):
|
||||||
|
_5min = 5 * 60
|
||||||
|
if not chatbot: return False # chatbot is None
|
||||||
|
most_recent_uploaded = chatbot._cookies.get("most_recent_uploaded", None)
|
||||||
|
if not most_recent_uploaded: return False # most_recent_uploaded is None
|
||||||
|
if time.time() - most_recent_uploaded["time"] < _5min: return True # most_recent_uploaded is new
|
||||||
|
else: return False # most_recent_uploaded is too old
|
||||||
|
|
||||||
|
def get_recent_file_prompt_support(chatbot):
|
||||||
|
most_recent_uploaded = chatbot._cookies.get("most_recent_uploaded", None)
|
||||||
|
path = most_recent_uploaded['path']
|
||||||
|
return path
|
||||||
|
|
||||||
|
@CatchException
|
||||||
|
def 虚空终端CodeInterpreter(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
|
||||||
|
"""
|
||||||
|
txt 输入栏用户输入的文本,例如需要翻译的一段话,再例如一个包含了待处理文件的路径
|
||||||
|
llm_kwargs gpt模型参数,如温度和top_p等,一般原样传递下去就行
|
||||||
|
plugin_kwargs 插件模型的参数,暂时没有用武之地
|
||||||
|
chatbot 聊天显示框的句柄,用于显示给用户
|
||||||
|
history 聊天历史,前情提要
|
||||||
|
system_prompt 给gpt的静默提醒
|
||||||
|
web_port 当前软件运行的端口号
|
||||||
|
"""
|
||||||
|
raise NotImplementedError
|
||||||
|
|
||||||
|
# 清空历史,以免输入溢出
|
||||||
|
history = []; clear_file_downloadzone(chatbot)
|
||||||
|
|
||||||
|
# 基本信息:功能、贡献者
|
||||||
|
chatbot.append([
|
||||||
|
"函数插件功能?",
|
||||||
|
"CodeInterpreter开源版, 此插件处于开发阶段, 建议暂时不要使用, 插件初始化中 ..."
|
||||||
|
])
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
||||||
|
|
||||||
|
if have_any_recent_upload_files(chatbot):
|
||||||
|
file_path = get_recent_file_prompt_support(chatbot)
|
||||||
|
else:
|
||||||
|
chatbot.append(["文件检索", "没有发现任何近期上传的文件。"])
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
||||||
|
|
||||||
|
# 读取文件
|
||||||
|
if ("recently_uploaded_files" in plugin_kwargs) and (plugin_kwargs["recently_uploaded_files"] == ""): plugin_kwargs.pop("recently_uploaded_files")
|
||||||
|
recently_uploaded_files = plugin_kwargs.get("recently_uploaded_files", None)
|
||||||
|
file_path = recently_uploaded_files[-1]
|
||||||
|
file_type = file_path.split('.')[-1]
|
||||||
|
|
||||||
|
# 粗心检查
|
||||||
|
if 'private_upload' in txt:
|
||||||
|
chatbot.append([
|
||||||
|
"...",
|
||||||
|
f"请在输入框内填写需求,然后再次点击该插件(文件路径 {file_path} 已经被记忆)"
|
||||||
|
])
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
||||||
|
return
|
||||||
|
|
||||||
|
# 开始干正事
|
||||||
|
for j in range(5): # 最多重试5次
|
||||||
|
try:
|
||||||
|
code, installation_advance, txt, file_type, llm_kwargs, chatbot, history = \
|
||||||
|
yield from gpt_interact_multi_step(txt, file_type, llm_kwargs, chatbot, history)
|
||||||
|
code = get_code_block(code)
|
||||||
|
res = make_module(code)
|
||||||
|
instance = init_module_instance(res)
|
||||||
|
break
|
||||||
|
except Exception as e:
|
||||||
|
chatbot.append([f"第{j}次代码生成尝试,失败了", f"错误追踪\n```\n{trimmed_format_exc()}\n```\n"])
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
||||||
|
|
||||||
|
# 代码生成结束, 开始执行
|
||||||
|
try:
|
||||||
|
import multiprocessing
|
||||||
|
manager = multiprocessing.Manager()
|
||||||
|
return_dict = manager.dict()
|
||||||
|
|
||||||
|
p = multiprocessing.Process(target=subprocess_worker, args=(instance, file_path, return_dict))
|
||||||
|
# only has 10 seconds to run
|
||||||
|
p.start(); p.join(timeout=10)
|
||||||
|
if p.is_alive(): p.terminate(); p.join()
|
||||||
|
p.close()
|
||||||
|
res = return_dict['result']
|
||||||
|
# res = instance.run(file_path)
|
||||||
|
except Exception as e:
|
||||||
|
chatbot.append(["执行失败了", f"错误追踪\n```\n{trimmed_format_exc()}\n```\n"])
|
||||||
|
# chatbot.append(["如果是缺乏依赖,请参考以下建议", installation_advance])
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
||||||
|
return
|
||||||
|
|
||||||
|
# 顺利完成,收尾
|
||||||
|
res = str(res)
|
||||||
|
if os.path.exists(res):
|
||||||
|
chatbot.append(["执行成功了,结果是一个有效文件", "结果:" + res])
|
||||||
|
new_file_path = promote_file_to_downloadzone(res, chatbot=chatbot)
|
||||||
|
chatbot = for_immediate_show_off_when_possible(file_type, new_file_path, chatbot)
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 界面更新
|
||||||
|
else:
|
||||||
|
chatbot.append(["执行成功了,结果是一个字符串", "结果:" + res])
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 界面更新
|
||||||
|
|
||||||
|
"""
|
||||||
|
测试:
|
||||||
|
裁剪图像,保留下半部分
|
||||||
|
交换图像的蓝色通道和红色通道
|
||||||
|
将图像转为灰度图像
|
||||||
|
将csv文件转excel表格
|
||||||
|
"""
|
||||||
@@ -109,7 +109,7 @@ def arxiv_download(chatbot, history, txt):
|
|||||||
|
|
||||||
url_ = txt # https://arxiv.org/abs/1707.06690
|
url_ = txt # https://arxiv.org/abs/1707.06690
|
||||||
if not txt.startswith('https://arxiv.org/abs/'):
|
if not txt.startswith('https://arxiv.org/abs/'):
|
||||||
msg = f"解析arxiv网址失败, 期望格式例如: https://arxiv.org/abs/1707.06690。实际得到格式: {url_}"
|
msg = f"解析arxiv网址失败, 期望格式例如: https://arxiv.org/abs/1707.06690。实际得到格式: {url_}。"
|
||||||
yield from update_ui_lastest_msg(msg, chatbot=chatbot, history=history) # 刷新界面
|
yield from update_ui_lastest_msg(msg, chatbot=chatbot, history=history) # 刷新界面
|
||||||
return msg, None
|
return msg, None
|
||||||
# <-------------- set format ------------->
|
# <-------------- set format ------------->
|
||||||
@@ -255,7 +255,7 @@ def Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot,
|
|||||||
project_folder = txt
|
project_folder = txt
|
||||||
else:
|
else:
|
||||||
if txt == "": txt = '空空如也的输入栏'
|
if txt == "": txt = '空空如也的输入栏'
|
||||||
report_execption(chatbot, history, a = f"解析项目: {txt}", b = f"找不到本地项目或无权访问: {txt}")
|
report_execption(chatbot, history, a = f"解析项目: {txt}", b = f"找不到本地项目或无法处理: {txt}")
|
||||||
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
||||||
return
|
return
|
||||||
|
|
||||||
|
|||||||
@@ -469,6 +469,7 @@ def read_and_clean_pdf_text(fp):
|
|||||||
'- ', '') for t in text_areas['blocks'] if 'lines' in t]
|
'- ', '') for t in text_areas['blocks'] if 'lines' in t]
|
||||||
|
|
||||||
############################## <第 2 步,获取正文主字体> ##################################
|
############################## <第 2 步,获取正文主字体> ##################################
|
||||||
|
try:
|
||||||
fsize_statiscs = {}
|
fsize_statiscs = {}
|
||||||
for span in meta_span:
|
for span in meta_span:
|
||||||
if span[1] not in fsize_statiscs: fsize_statiscs[span[1]] = 0
|
if span[1] not in fsize_statiscs: fsize_statiscs[span[1]] = 0
|
||||||
@@ -476,7 +477,8 @@ def read_and_clean_pdf_text(fp):
|
|||||||
main_fsize = max(fsize_statiscs, key=fsize_statiscs.get)
|
main_fsize = max(fsize_statiscs, key=fsize_statiscs.get)
|
||||||
if REMOVE_FOOT_NOTE:
|
if REMOVE_FOOT_NOTE:
|
||||||
give_up_fize_threshold = main_fsize * REMOVE_FOOT_FFSIZE_PERCENT
|
give_up_fize_threshold = main_fsize * REMOVE_FOOT_FFSIZE_PERCENT
|
||||||
|
except:
|
||||||
|
raise RuntimeError(f'抱歉, 我们暂时无法解析此PDF文档: {fp}。')
|
||||||
############################## <第 3 步,切分和重新整合> ##################################
|
############################## <第 3 步,切分和重新整合> ##################################
|
||||||
mega_sec = []
|
mega_sec = []
|
||||||
sec = []
|
sec = []
|
||||||
@@ -591,11 +593,16 @@ def get_files_from_everything(txt, type): # type='.md'
|
|||||||
# 网络的远程文件
|
# 网络的远程文件
|
||||||
import requests
|
import requests
|
||||||
from toolbox import get_conf
|
from toolbox import get_conf
|
||||||
|
from toolbox import get_log_folder, gen_time_str
|
||||||
proxies, = get_conf('proxies')
|
proxies, = get_conf('proxies')
|
||||||
|
try:
|
||||||
r = requests.get(txt, proxies=proxies)
|
r = requests.get(txt, proxies=proxies)
|
||||||
with open('./gpt_log/temp'+type, 'wb+') as f: f.write(r.content)
|
except:
|
||||||
project_folder = './gpt_log/'
|
raise ConnectionRefusedError(f"无法下载资源{txt},请检查。")
|
||||||
file_manifest = ['./gpt_log/temp'+type]
|
path = os.path.join(get_log_folder(plugin_name='web_download'), gen_time_str()+type)
|
||||||
|
with open(path, 'wb+') as f: f.write(r.content)
|
||||||
|
project_folder = get_log_folder(plugin_name='web_download')
|
||||||
|
file_manifest = [path]
|
||||||
elif txt.endswith(type):
|
elif txt.endswith(type):
|
||||||
# 直接给定文件
|
# 直接给定文件
|
||||||
file_manifest = [txt]
|
file_manifest = [txt]
|
||||||
|
|||||||
@@ -423,7 +423,7 @@ def compile_latex_with_timeout(command, cwd, timeout=60):
|
|||||||
|
|
||||||
def merge_pdfs(pdf1_path, pdf2_path, output_path):
|
def merge_pdfs(pdf1_path, pdf2_path, output_path):
|
||||||
import PyPDF2
|
import PyPDF2
|
||||||
Percent = 0.8
|
Percent = 0.95
|
||||||
# Open the first PDF file
|
# Open the first PDF file
|
||||||
with open(pdf1_path, 'rb') as pdf1_file:
|
with open(pdf1_path, 'rb') as pdf1_file:
|
||||||
pdf1_reader = PyPDF2.PdfFileReader(pdf1_file)
|
pdf1_reader = PyPDF2.PdfFileReader(pdf1_file)
|
||||||
|
|||||||
@@ -1,4 +1,4 @@
|
|||||||
import time, threading, json
|
import time, logging, json
|
||||||
|
|
||||||
|
|
||||||
class AliyunASR():
|
class AliyunASR():
|
||||||
@@ -12,14 +12,14 @@ class AliyunASR():
|
|||||||
message = json.loads(message)
|
message = json.loads(message)
|
||||||
self.parsed_sentence = message['payload']['result']
|
self.parsed_sentence = message['payload']['result']
|
||||||
self.event_on_entence_end.set()
|
self.event_on_entence_end.set()
|
||||||
print(self.parsed_sentence)
|
# print(self.parsed_sentence)
|
||||||
|
|
||||||
def test_on_start(self, message, *args):
|
def test_on_start(self, message, *args):
|
||||||
# print("test_on_start:{}".format(message))
|
# print("test_on_start:{}".format(message))
|
||||||
pass
|
pass
|
||||||
|
|
||||||
def test_on_error(self, message, *args):
|
def test_on_error(self, message, *args):
|
||||||
print("on_error args=>{}".format(args))
|
logging.error("on_error args=>{}".format(args))
|
||||||
pass
|
pass
|
||||||
|
|
||||||
def test_on_close(self, *args):
|
def test_on_close(self, *args):
|
||||||
@@ -36,7 +36,6 @@ class AliyunASR():
|
|||||||
# print("on_completed:args=>{} message=>{}".format(args, message))
|
# print("on_completed:args=>{} message=>{}".format(args, message))
|
||||||
pass
|
pass
|
||||||
|
|
||||||
|
|
||||||
def audio_convertion_thread(self, uuid):
|
def audio_convertion_thread(self, uuid):
|
||||||
# 在一个异步线程中采集音频
|
# 在一个异步线程中采集音频
|
||||||
import nls # pip install git+https://github.com/aliyun/alibabacloud-nls-python-sdk.git
|
import nls # pip install git+https://github.com/aliyun/alibabacloud-nls-python-sdk.git
|
||||||
|
|||||||
@@ -20,6 +20,11 @@ def get_avail_grobid_url():
|
|||||||
def parse_pdf(pdf_path, grobid_url):
|
def parse_pdf(pdf_path, grobid_url):
|
||||||
import scipdf # pip install scipdf_parser
|
import scipdf # pip install scipdf_parser
|
||||||
if grobid_url.endswith('/'): grobid_url = grobid_url.rstrip('/')
|
if grobid_url.endswith('/'): grobid_url = grobid_url.rstrip('/')
|
||||||
|
try:
|
||||||
article_dict = scipdf.parse_pdf_to_dict(pdf_path, grobid_url=grobid_url)
|
article_dict = scipdf.parse_pdf_to_dict(pdf_path, grobid_url=grobid_url)
|
||||||
|
except GROBID_OFFLINE_EXCEPTION:
|
||||||
|
raise GROBID_OFFLINE_EXCEPTION("GROBID服务不可用,请修改config中的GROBID_URL,可修改成本地GROBID服务。")
|
||||||
|
except:
|
||||||
|
raise RuntimeError("解析PDF失败,请检查PDF是否损坏。")
|
||||||
return article_dict
|
return article_dict
|
||||||
|
|
||||||
|
|||||||
271
crazy_functions/批量翻译PDF文档_NOUGAT.py
普通文件
271
crazy_functions/批量翻译PDF文档_NOUGAT.py
普通文件
@@ -0,0 +1,271 @@
|
|||||||
|
from toolbox import CatchException, report_execption, gen_time_str
|
||||||
|
from toolbox import update_ui, promote_file_to_downloadzone, update_ui_lastest_msg, disable_auto_promotion
|
||||||
|
from toolbox import write_history_to_file, get_log_folder
|
||||||
|
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
|
||||||
|
from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
|
||||||
|
from .crazy_utils import read_and_clean_pdf_text
|
||||||
|
from .pdf_fns.parse_pdf import parse_pdf, get_avail_grobid_url
|
||||||
|
from colorful import *
|
||||||
|
import os
|
||||||
|
import math
|
||||||
|
import logging
|
||||||
|
|
||||||
|
def markdown_to_dict(article_content):
|
||||||
|
import markdown
|
||||||
|
from bs4 import BeautifulSoup
|
||||||
|
cur_t = ""
|
||||||
|
cur_c = ""
|
||||||
|
results = {}
|
||||||
|
for line in article_content:
|
||||||
|
if line.startswith('#'):
|
||||||
|
if cur_t!="":
|
||||||
|
if cur_t not in results:
|
||||||
|
results.update({cur_t:cur_c.lstrip('\n')})
|
||||||
|
else:
|
||||||
|
# 处理重名的章节
|
||||||
|
results.update({cur_t + " " + gen_time_str():cur_c.lstrip('\n')})
|
||||||
|
cur_t = line.rstrip('\n')
|
||||||
|
cur_c = ""
|
||||||
|
else:
|
||||||
|
cur_c += line
|
||||||
|
results_final = {}
|
||||||
|
for k in list(results.keys()):
|
||||||
|
if k.startswith('# '):
|
||||||
|
results_final['title'] = k.split('# ')[-1]
|
||||||
|
results_final['authors'] = results.pop(k).lstrip('\n')
|
||||||
|
if k.startswith('###### Abstract'):
|
||||||
|
results_final['abstract'] = results.pop(k).lstrip('\n')
|
||||||
|
|
||||||
|
results_final_sections = []
|
||||||
|
for k,v in results.items():
|
||||||
|
results_final_sections.append({
|
||||||
|
'heading':k.lstrip("# "),
|
||||||
|
'text':v if len(v) > 0 else f"The beginning of {k.lstrip('# ')} section."
|
||||||
|
})
|
||||||
|
results_final['sections'] = results_final_sections
|
||||||
|
return results_final
|
||||||
|
|
||||||
|
|
||||||
|
@CatchException
|
||||||
|
def 批量翻译PDF文档(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
|
||||||
|
|
||||||
|
disable_auto_promotion(chatbot)
|
||||||
|
# 基本信息:功能、贡献者
|
||||||
|
chatbot.append([
|
||||||
|
"函数插件功能?",
|
||||||
|
"批量翻译PDF文档。函数插件贡献者: Binary-Husky"])
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
||||||
|
|
||||||
|
# 尝试导入依赖,如果缺少依赖,则给出安装建议
|
||||||
|
try:
|
||||||
|
import nougat
|
||||||
|
import tiktoken
|
||||||
|
except:
|
||||||
|
report_execption(chatbot, history,
|
||||||
|
a=f"解析项目: {txt}",
|
||||||
|
b=f"导入软件依赖失败。使用该模块需要额外依赖,安装方法```pip install --upgrade nougat-ocr tiktoken```。")
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
||||||
|
return
|
||||||
|
|
||||||
|
# 清空历史,以免输入溢出
|
||||||
|
history = []
|
||||||
|
|
||||||
|
from .crazy_utils import get_files_from_everything
|
||||||
|
success, file_manifest, project_folder = get_files_from_everything(txt, type='.pdf')
|
||||||
|
# 检测输入参数,如没有给定输入参数,直接退出
|
||||||
|
if not success:
|
||||||
|
if txt == "": txt = '空空如也的输入栏'
|
||||||
|
|
||||||
|
# 如果没找到任何文件
|
||||||
|
if len(file_manifest) == 0:
|
||||||
|
report_execption(chatbot, history,
|
||||||
|
a=f"解析项目: {txt}", b=f"找不到任何.tex或.pdf文件: {txt}")
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
||||||
|
return
|
||||||
|
|
||||||
|
# 开始正式执行任务
|
||||||
|
yield from 解析PDF_基于NOUGAT(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt)
|
||||||
|
|
||||||
|
|
||||||
|
def nougat_with_timeout(command, cwd, timeout=3600):
|
||||||
|
import subprocess
|
||||||
|
process = subprocess.Popen(command, shell=True, cwd=cwd)
|
||||||
|
try:
|
||||||
|
stdout, stderr = process.communicate(timeout=timeout)
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
process.kill()
|
||||||
|
stdout, stderr = process.communicate()
|
||||||
|
print("Process timed out!")
|
||||||
|
return False
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def NOUGAT_parse_pdf(fp):
|
||||||
|
import glob
|
||||||
|
from toolbox import get_log_folder, gen_time_str
|
||||||
|
dst = os.path.join(get_log_folder(plugin_name='nougat'), gen_time_str())
|
||||||
|
os.makedirs(dst)
|
||||||
|
nougat_with_timeout(f'nougat --out "{os.path.abspath(dst)}" "{os.path.abspath(fp)}"', os.getcwd())
|
||||||
|
res = glob.glob(os.path.join(dst,'*.mmd'))
|
||||||
|
if len(res) == 0:
|
||||||
|
raise RuntimeError("Nougat解析论文失败。")
|
||||||
|
return res[0]
|
||||||
|
|
||||||
|
|
||||||
|
def 解析PDF_基于NOUGAT(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
|
||||||
|
import copy
|
||||||
|
import tiktoken
|
||||||
|
TOKEN_LIMIT_PER_FRAGMENT = 1280
|
||||||
|
generated_conclusion_files = []
|
||||||
|
generated_html_files = []
|
||||||
|
DST_LANG = "中文"
|
||||||
|
for index, fp in enumerate(file_manifest):
|
||||||
|
chatbot.append(["当前进度:", f"正在解析论文,请稍候。(第一次运行时,需要花费较长时间下载NOUGAT参数)"]); yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
||||||
|
fpp = NOUGAT_parse_pdf(fp)
|
||||||
|
|
||||||
|
with open(fpp, 'r', encoding='utf8') as f:
|
||||||
|
article_content = f.readlines()
|
||||||
|
article_dict = markdown_to_dict(article_content)
|
||||||
|
logging.info(article_dict)
|
||||||
|
|
||||||
|
prompt = "以下是一篇学术论文的基本信息:\n"
|
||||||
|
# title
|
||||||
|
title = article_dict.get('title', '无法获取 title'); prompt += f'title:{title}\n\n'
|
||||||
|
# authors
|
||||||
|
authors = article_dict.get('authors', '无法获取 authors'); prompt += f'authors:{authors}\n\n'
|
||||||
|
# abstract
|
||||||
|
abstract = article_dict.get('abstract', '无法获取 abstract'); prompt += f'abstract:{abstract}\n\n'
|
||||||
|
# command
|
||||||
|
prompt += f"请将题目和摘要翻译为{DST_LANG}。"
|
||||||
|
meta = [f'# Title:\n\n', title, f'# Abstract:\n\n', abstract ]
|
||||||
|
|
||||||
|
# 单线,获取文章meta信息
|
||||||
|
paper_meta_info = yield from request_gpt_model_in_new_thread_with_ui_alive(
|
||||||
|
inputs=prompt,
|
||||||
|
inputs_show_user=prompt,
|
||||||
|
llm_kwargs=llm_kwargs,
|
||||||
|
chatbot=chatbot, history=[],
|
||||||
|
sys_prompt="You are an academic paper reader。",
|
||||||
|
)
|
||||||
|
|
||||||
|
# 多线,翻译
|
||||||
|
inputs_array = []
|
||||||
|
inputs_show_user_array = []
|
||||||
|
|
||||||
|
# get_token_num
|
||||||
|
from request_llm.bridge_all import model_info
|
||||||
|
enc = model_info[llm_kwargs['llm_model']]['tokenizer']
|
||||||
|
def get_token_num(txt): return len(enc.encode(txt, disallowed_special=()))
|
||||||
|
from .crazy_utils import breakdown_txt_to_satisfy_token_limit_for_pdf
|
||||||
|
|
||||||
|
def break_down(txt):
|
||||||
|
raw_token_num = get_token_num(txt)
|
||||||
|
if raw_token_num <= TOKEN_LIMIT_PER_FRAGMENT:
|
||||||
|
return [txt]
|
||||||
|
else:
|
||||||
|
# raw_token_num > TOKEN_LIMIT_PER_FRAGMENT
|
||||||
|
# find a smooth token limit to achieve even seperation
|
||||||
|
count = int(math.ceil(raw_token_num / TOKEN_LIMIT_PER_FRAGMENT))
|
||||||
|
token_limit_smooth = raw_token_num // count + count
|
||||||
|
return breakdown_txt_to_satisfy_token_limit_for_pdf(txt, get_token_fn=get_token_num, limit=token_limit_smooth)
|
||||||
|
|
||||||
|
for section in article_dict.get('sections'):
|
||||||
|
if len(section['text']) == 0: continue
|
||||||
|
section_frags = break_down(section['text'])
|
||||||
|
for i, fragment in enumerate(section_frags):
|
||||||
|
heading = section['heading']
|
||||||
|
if len(section_frags) > 1: heading += f' Part-{i+1}'
|
||||||
|
inputs_array.append(
|
||||||
|
f"你需要翻译{heading}章节,内容如下: \n\n{fragment}"
|
||||||
|
)
|
||||||
|
inputs_show_user_array.append(
|
||||||
|
f"# {heading}\n\n{fragment}"
|
||||||
|
)
|
||||||
|
|
||||||
|
gpt_response_collection = yield from request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
|
||||||
|
inputs_array=inputs_array,
|
||||||
|
inputs_show_user_array=inputs_show_user_array,
|
||||||
|
llm_kwargs=llm_kwargs,
|
||||||
|
chatbot=chatbot,
|
||||||
|
history_array=[meta for _ in inputs_array],
|
||||||
|
sys_prompt_array=[
|
||||||
|
"请你作为一个学术翻译,负责把学术论文准确翻译成中文。注意文章中的每一句话都要翻译。" for _ in inputs_array],
|
||||||
|
)
|
||||||
|
res_path = write_history_to_file(meta + ["# Meta Translation" , paper_meta_info] + gpt_response_collection, file_basename=None, file_fullname=None)
|
||||||
|
promote_file_to_downloadzone(res_path, rename_file=os.path.basename(fp)+'.md', chatbot=chatbot)
|
||||||
|
generated_conclusion_files.append(res_path)
|
||||||
|
|
||||||
|
ch = construct_html()
|
||||||
|
orig = ""
|
||||||
|
trans = ""
|
||||||
|
gpt_response_collection_html = copy.deepcopy(gpt_response_collection)
|
||||||
|
for i,k in enumerate(gpt_response_collection_html):
|
||||||
|
if i%2==0:
|
||||||
|
gpt_response_collection_html[i] = inputs_show_user_array[i//2]
|
||||||
|
else:
|
||||||
|
gpt_response_collection_html[i] = gpt_response_collection_html[i]
|
||||||
|
|
||||||
|
final = ["", "", "一、论文概况", "", "Abstract", paper_meta_info, "二、论文翻译", ""]
|
||||||
|
final.extend(gpt_response_collection_html)
|
||||||
|
for i, k in enumerate(final):
|
||||||
|
if i%2==0:
|
||||||
|
orig = k
|
||||||
|
if i%2==1:
|
||||||
|
trans = k
|
||||||
|
ch.add_row(a=orig, b=trans)
|
||||||
|
create_report_file_name = f"{os.path.basename(fp)}.trans.html"
|
||||||
|
html_file = ch.save_file(create_report_file_name)
|
||||||
|
generated_html_files.append(html_file)
|
||||||
|
promote_file_to_downloadzone(html_file, rename_file=os.path.basename(html_file), chatbot=chatbot)
|
||||||
|
|
||||||
|
chatbot.append(("给出输出文件清单", str(generated_conclusion_files + generated_html_files)))
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
class construct_html():
|
||||||
|
def __init__(self) -> None:
|
||||||
|
self.css = """
|
||||||
|
.row {
|
||||||
|
display: flex;
|
||||||
|
flex-wrap: wrap;
|
||||||
|
}
|
||||||
|
|
||||||
|
.column {
|
||||||
|
flex: 1;
|
||||||
|
padding: 10px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.table-header {
|
||||||
|
font-weight: bold;
|
||||||
|
border-bottom: 1px solid black;
|
||||||
|
}
|
||||||
|
|
||||||
|
.table-row {
|
||||||
|
border-bottom: 1px solid lightgray;
|
||||||
|
}
|
||||||
|
|
||||||
|
.table-cell {
|
||||||
|
padding: 5px;
|
||||||
|
}
|
||||||
|
"""
|
||||||
|
self.html_string = f'<!DOCTYPE html><head><meta charset="utf-8"><title>翻译结果</title><style>{self.css}</style></head>'
|
||||||
|
|
||||||
|
|
||||||
|
def add_row(self, a, b):
|
||||||
|
tmp = """
|
||||||
|
<div class="row table-row">
|
||||||
|
<div class="column table-cell">REPLACE_A</div>
|
||||||
|
<div class="column table-cell">REPLACE_B</div>
|
||||||
|
</div>
|
||||||
|
"""
|
||||||
|
from toolbox import markdown_convertion
|
||||||
|
tmp = tmp.replace('REPLACE_A', markdown_convertion(a))
|
||||||
|
tmp = tmp.replace('REPLACE_B', markdown_convertion(b))
|
||||||
|
self.html_string += tmp
|
||||||
|
|
||||||
|
|
||||||
|
def save_file(self, file_name):
|
||||||
|
with open(os.path.join(get_log_folder(), file_name), 'w', encoding='utf8') as f:
|
||||||
|
f.write(self.html_string.encode('utf-8', 'ignore').decode())
|
||||||
|
return os.path.join(get_log_folder(), file_name)
|
||||||
@@ -24,10 +24,11 @@ def 批量翻译PDF文档(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst
|
|||||||
try:
|
try:
|
||||||
import fitz
|
import fitz
|
||||||
import tiktoken
|
import tiktoken
|
||||||
|
import scipdf
|
||||||
except:
|
except:
|
||||||
report_execption(chatbot, history,
|
report_execption(chatbot, history,
|
||||||
a=f"解析项目: {txt}",
|
a=f"解析项目: {txt}",
|
||||||
b=f"导入软件依赖失败。使用该模块需要额外依赖,安装方法```pip install --upgrade pymupdf tiktoken```。")
|
b=f"导入软件依赖失败。使用该模块需要额外依赖,安装方法```pip install --upgrade pymupdf tiktoken scipdf_parser```。")
|
||||||
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
||||||
return
|
return
|
||||||
|
|
||||||
@@ -58,7 +59,6 @@ def 批量翻译PDF文档(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst
|
|||||||
|
|
||||||
def 解析PDF_基于GROBID(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, grobid_url):
|
def 解析PDF_基于GROBID(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, grobid_url):
|
||||||
import copy
|
import copy
|
||||||
import tiktoken
|
|
||||||
TOKEN_LIMIT_PER_FRAGMENT = 1280
|
TOKEN_LIMIT_PER_FRAGMENT = 1280
|
||||||
generated_conclusion_files = []
|
generated_conclusion_files = []
|
||||||
generated_html_files = []
|
generated_html_files = []
|
||||||
@@ -66,7 +66,7 @@ def 解析PDF_基于GROBID(file_manifest, project_folder, llm_kwargs, plugin_kwa
|
|||||||
for index, fp in enumerate(file_manifest):
|
for index, fp in enumerate(file_manifest):
|
||||||
chatbot.append(["当前进度:", f"正在连接GROBID服务,请稍候: {grobid_url}\n如果等待时间过长,请修改config中的GROBID_URL,可修改成本地GROBID服务。"]); yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
chatbot.append(["当前进度:", f"正在连接GROBID服务,请稍候: {grobid_url}\n如果等待时间过长,请修改config中的GROBID_URL,可修改成本地GROBID服务。"]); yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
||||||
article_dict = parse_pdf(fp, grobid_url)
|
article_dict = parse_pdf(fp, grobid_url)
|
||||||
print(article_dict)
|
if article_dict is None: raise RuntimeError("解析PDF失败,请检查PDF是否损坏。")
|
||||||
prompt = "以下是一篇学术论文的基本信息:\n"
|
prompt = "以下是一篇学术论文的基本信息:\n"
|
||||||
# title
|
# title
|
||||||
title = article_dict.get('title', '无法获取 title'); prompt += f'title:{title}\n\n'
|
title = article_dict.get('title', '无法获取 title'); prompt += f'title:{title}\n\n'
|
||||||
|
|||||||
@@ -24,12 +24,13 @@ explain_msg = """
|
|||||||
## 虚空终端插件说明:
|
## 虚空终端插件说明:
|
||||||
|
|
||||||
1. 请用**自然语言**描述您需要做什么。例如:
|
1. 请用**自然语言**描述您需要做什么。例如:
|
||||||
- 「请调用插件,为我翻译PDF论文,论文我刚刚放到上传区了。」
|
- 「请调用插件,为我翻译PDF论文,论文我刚刚放到上传区了」
|
||||||
- 「请调用插件翻译PDF论文,地址为https://www.nature.com/articles/s41586-019-1724-z.pdf」
|
- 「请调用插件翻译PDF论文,地址为https://openreview.net/pdf?id=rJl0r3R9KX」
|
||||||
- 「生成一张图片,图中鲜花怒放,绿草如茵,用插件实现。」
|
- 「把Arxiv论文翻译成中文PDF,arxiv论文的ID是1812.10695,记得用插件!」
|
||||||
|
- 「生成一张图片,图中鲜花怒放,绿草如茵,用插件实现」
|
||||||
- 「用插件翻译README,Github网址是https://github.com/facebookresearch/co-tracker」
|
- 「用插件翻译README,Github网址是https://github.com/facebookresearch/co-tracker」
|
||||||
- 「给爷翻译Arxiv论文,arxiv论文的ID是1812.10695,记得用插件,不要自己瞎搞!」
|
- 「我不喜欢当前的界面颜色,修改配置,把主题THEME更换为THEME="High-Contrast"」
|
||||||
- 「我不喜欢当前的界面颜色,修改配置,把主题THEME更换为THEME="High-Contrast"。」
|
- 「请调用插件,解析python源代码项目,代码我刚刚打包拖到上传区了」
|
||||||
- 「请问Transformer网络的结构是怎样的?」
|
- 「请问Transformer网络的结构是怎样的?」
|
||||||
|
|
||||||
2. 您可以打开插件下拉菜单以了解本项目的各种能力。
|
2. 您可以打开插件下拉菜单以了解本项目的各种能力。
|
||||||
|
|||||||
@@ -1,12 +1,13 @@
|
|||||||
from toolbox import update_ui
|
from toolbox import update_ui, promote_file_to_downloadzone, disable_auto_promotion
|
||||||
from toolbox import CatchException, report_execption, write_results_to_file
|
from toolbox import CatchException, report_execption, write_history_to_file
|
||||||
from .crazy_utils import input_clipping
|
from .crazy_utils import input_clipping
|
||||||
|
|
||||||
def 解析源代码新(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
|
def 解析源代码新(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
|
||||||
import os, copy
|
import os, copy
|
||||||
from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
|
from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
|
||||||
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
|
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
|
||||||
msg = '正常'
|
disable_auto_promotion(chatbot=chatbot)
|
||||||
|
|
||||||
summary_batch_isolation = True
|
summary_batch_isolation = True
|
||||||
inputs_array = []
|
inputs_array = []
|
||||||
inputs_show_user_array = []
|
inputs_show_user_array = []
|
||||||
@@ -43,7 +44,8 @@ def 解析源代码新(file_manifest, project_folder, llm_kwargs, plugin_kwargs,
|
|||||||
# 全部文件解析完成,结果写入文件,准备对工程源代码进行汇总分析
|
# 全部文件解析完成,结果写入文件,准备对工程源代码进行汇总分析
|
||||||
report_part_1 = copy.deepcopy(gpt_response_collection)
|
report_part_1 = copy.deepcopy(gpt_response_collection)
|
||||||
history_to_return = report_part_1
|
history_to_return = report_part_1
|
||||||
res = write_results_to_file(report_part_1)
|
res = write_history_to_file(report_part_1)
|
||||||
|
promote_file_to_downloadzone(res, chatbot=chatbot)
|
||||||
chatbot.append(("完成?", "逐个文件分析已完成。" + res + "\n\n正在开始汇总。"))
|
chatbot.append(("完成?", "逐个文件分析已完成。" + res + "\n\n正在开始汇总。"))
|
||||||
yield from update_ui(chatbot=chatbot, history=history_to_return) # 刷新界面
|
yield from update_ui(chatbot=chatbot, history=history_to_return) # 刷新界面
|
||||||
|
|
||||||
@@ -97,7 +99,8 @@ def 解析源代码新(file_manifest, project_folder, llm_kwargs, plugin_kwargs,
|
|||||||
|
|
||||||
############################## <END> ##################################
|
############################## <END> ##################################
|
||||||
history_to_return.extend(report_part_2)
|
history_to_return.extend(report_part_2)
|
||||||
res = write_results_to_file(history_to_return)
|
res = write_history_to_file(history_to_return)
|
||||||
|
promote_file_to_downloadzone(res, chatbot=chatbot)
|
||||||
chatbot.append(("完成了吗?", res))
|
chatbot.append(("完成了吗?", res))
|
||||||
yield from update_ui(chatbot=chatbot, history=history_to_return) # 刷新界面
|
yield from update_ui(chatbot=chatbot, history=history_to_return) # 刷新界面
|
||||||
|
|
||||||
|
|||||||
@@ -80,9 +80,9 @@ class InterviewAssistant(AliyunASR):
|
|||||||
def __init__(self):
|
def __init__(self):
|
||||||
self.capture_interval = 0.5 # second
|
self.capture_interval = 0.5 # second
|
||||||
self.stop = False
|
self.stop = False
|
||||||
self.parsed_text = ""
|
self.parsed_text = "" # 下个句子中已经说完的部分, 由 test_on_result_chg() 写入
|
||||||
self.parsed_sentence = ""
|
self.parsed_sentence = "" # 某段话的整个句子,由 test_on_sentence_end() 写入
|
||||||
self.buffered_sentence = ""
|
self.buffered_sentence = "" #
|
||||||
self.event_on_result_chg = threading.Event()
|
self.event_on_result_chg = threading.Event()
|
||||||
self.event_on_entence_end = threading.Event()
|
self.event_on_entence_end = threading.Event()
|
||||||
self.event_on_commit_question = threading.Event()
|
self.event_on_commit_question = threading.Event()
|
||||||
@@ -132,7 +132,7 @@ class InterviewAssistant(AliyunASR):
|
|||||||
self.plugin_wd.feed()
|
self.plugin_wd.feed()
|
||||||
|
|
||||||
if self.event_on_result_chg.is_set():
|
if self.event_on_result_chg.is_set():
|
||||||
# update audio decode result
|
# called when some words have finished
|
||||||
self.event_on_result_chg.clear()
|
self.event_on_result_chg.clear()
|
||||||
chatbot[-1] = list(chatbot[-1])
|
chatbot[-1] = list(chatbot[-1])
|
||||||
chatbot[-1][0] = self.buffered_sentence + self.parsed_text
|
chatbot[-1][0] = self.buffered_sentence + self.parsed_text
|
||||||
@@ -144,7 +144,11 @@ class InterviewAssistant(AliyunASR):
|
|||||||
# called when a sentence has ended
|
# called when a sentence has ended
|
||||||
self.event_on_entence_end.clear()
|
self.event_on_entence_end.clear()
|
||||||
self.parsed_text = self.parsed_sentence
|
self.parsed_text = self.parsed_sentence
|
||||||
self.buffered_sentence += self.parsed_sentence
|
self.buffered_sentence += self.parsed_text
|
||||||
|
chatbot[-1] = list(chatbot[-1])
|
||||||
|
chatbot[-1][0] = self.buffered_sentence
|
||||||
|
history = chatbot2history(chatbot)
|
||||||
|
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
|
||||||
|
|
||||||
if self.event_on_commit_question.is_set():
|
if self.event_on_commit_question.is_set():
|
||||||
# called when a question should be commited
|
# called when a question should be commited
|
||||||
|
|||||||
@@ -1,26 +1,75 @@
|
|||||||
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
|
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
|
||||||
from toolbox import CatchException, report_execption, write_results_to_file
|
from toolbox import CatchException, report_execption, promote_file_to_downloadzone
|
||||||
from toolbox import update_ui
|
from toolbox import update_ui, update_ui_lastest_msg, disable_auto_promotion, write_history_to_file
|
||||||
|
import logging
|
||||||
|
import requests
|
||||||
|
import time
|
||||||
|
import random
|
||||||
|
|
||||||
|
ENABLE_ALL_VERSION_SEARCH = True
|
||||||
|
|
||||||
def get_meta_information(url, chatbot, history):
|
def get_meta_information(url, chatbot, history):
|
||||||
import requests
|
|
||||||
import arxiv
|
import arxiv
|
||||||
import difflib
|
import difflib
|
||||||
|
import re
|
||||||
from bs4 import BeautifulSoup
|
from bs4 import BeautifulSoup
|
||||||
from toolbox import get_conf
|
from toolbox import get_conf
|
||||||
|
from urllib.parse import urlparse
|
||||||
|
session = requests.session()
|
||||||
|
|
||||||
proxies, = get_conf('proxies')
|
proxies, = get_conf('proxies')
|
||||||
headers = {
|
headers = {
|
||||||
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36',
|
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36',
|
||||||
|
'Accept-Encoding': 'gzip, deflate, br',
|
||||||
|
'Accept-Language': 'en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7',
|
||||||
|
'Cache-Control':'max-age=0',
|
||||||
|
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
|
||||||
|
'Connection': 'keep-alive'
|
||||||
}
|
}
|
||||||
# 发送 GET 请求
|
session.proxies.update(proxies)
|
||||||
response = requests.get(url, proxies=proxies, headers=headers)
|
session.headers.update(headers)
|
||||||
|
|
||||||
|
response = session.get(url)
|
||||||
# 解析网页内容
|
# 解析网页内容
|
||||||
soup = BeautifulSoup(response.text, "html.parser")
|
soup = BeautifulSoup(response.text, "html.parser")
|
||||||
|
|
||||||
def string_similar(s1, s2):
|
def string_similar(s1, s2):
|
||||||
return difflib.SequenceMatcher(None, s1, s2).quick_ratio()
|
return difflib.SequenceMatcher(None, s1, s2).quick_ratio()
|
||||||
|
|
||||||
|
if ENABLE_ALL_VERSION_SEARCH:
|
||||||
|
def search_all_version(url):
|
||||||
|
time.sleep(random.randint(1,5)) # 睡一会防止触发google反爬虫
|
||||||
|
response = session.get(url)
|
||||||
|
soup = BeautifulSoup(response.text, "html.parser")
|
||||||
|
|
||||||
|
for result in soup.select(".gs_ri"):
|
||||||
|
try:
|
||||||
|
url = result.select_one(".gs_rt").a['href']
|
||||||
|
except:
|
||||||
|
continue
|
||||||
|
arxiv_id = extract_arxiv_id(url)
|
||||||
|
if not arxiv_id:
|
||||||
|
continue
|
||||||
|
search = arxiv.Search(
|
||||||
|
id_list=[arxiv_id],
|
||||||
|
max_results=1,
|
||||||
|
sort_by=arxiv.SortCriterion.Relevance,
|
||||||
|
)
|
||||||
|
try: paper = next(search.results())
|
||||||
|
except: paper = None
|
||||||
|
return paper
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
|
def extract_arxiv_id(url):
|
||||||
|
# 返回给定的url解析出的arxiv_id,如url未成功匹配返回None
|
||||||
|
pattern = r'arxiv.org/abs/([^/]+)'
|
||||||
|
match = re.search(pattern, url)
|
||||||
|
if match:
|
||||||
|
return match.group(1)
|
||||||
|
else:
|
||||||
|
return None
|
||||||
|
|
||||||
profile = []
|
profile = []
|
||||||
# 获取所有文章的标题和作者
|
# 获取所有文章的标题和作者
|
||||||
for result in soup.select(".gs_ri"):
|
for result in soup.select(".gs_ri"):
|
||||||
@@ -31,32 +80,45 @@ def get_meta_information(url, chatbot, history):
|
|||||||
except:
|
except:
|
||||||
citation = 'cited by 0'
|
citation = 'cited by 0'
|
||||||
abstract = result.select_one(".gs_rs").text.strip() # 摘要在 .gs_rs 中的文本,需要清除首尾空格
|
abstract = result.select_one(".gs_rs").text.strip() # 摘要在 .gs_rs 中的文本,需要清除首尾空格
|
||||||
|
|
||||||
|
# 首先在arxiv上搜索,获取文章摘要
|
||||||
search = arxiv.Search(
|
search = arxiv.Search(
|
||||||
query = title,
|
query = title,
|
||||||
max_results = 1,
|
max_results = 1,
|
||||||
sort_by = arxiv.SortCriterion.Relevance,
|
sort_by = arxiv.SortCriterion.Relevance,
|
||||||
)
|
)
|
||||||
try:
|
try: paper = next(search.results())
|
||||||
paper = next(search.results())
|
except: paper = None
|
||||||
if string_similar(title, paper.title) > 0.90: # same paper
|
|
||||||
|
is_match = paper is not None and string_similar(title, paper.title) > 0.90
|
||||||
|
|
||||||
|
# 如果在Arxiv上匹配失败,检索文章的历史版本的题目
|
||||||
|
if not is_match and ENABLE_ALL_VERSION_SEARCH:
|
||||||
|
other_versions_page_url = [tag['href'] for tag in result.select_one('.gs_flb').select('.gs_nph') if 'cluster' in tag['href']]
|
||||||
|
if len(other_versions_page_url) > 0:
|
||||||
|
other_versions_page_url = other_versions_page_url[0]
|
||||||
|
paper = search_all_version('http://' + urlparse(url).netloc + other_versions_page_url)
|
||||||
|
is_match = paper is not None and string_similar(title, paper.title) > 0.90
|
||||||
|
|
||||||
|
if is_match:
|
||||||
|
# same paper
|
||||||
abstract = paper.summary.replace('\n', ' ')
|
abstract = paper.summary.replace('\n', ' ')
|
||||||
is_paper_in_arxiv = True
|
is_paper_in_arxiv = True
|
||||||
else: # different paper
|
else:
|
||||||
|
# different paper
|
||||||
abstract = abstract
|
abstract = abstract
|
||||||
is_paper_in_arxiv = False
|
is_paper_in_arxiv = False
|
||||||
paper = next(search.results())
|
|
||||||
except:
|
logging.info('[title]:' + title)
|
||||||
abstract = abstract
|
logging.info('[author]:' + author)
|
||||||
is_paper_in_arxiv = False
|
logging.info('[citation]:' + citation)
|
||||||
print(title)
|
|
||||||
print(author)
|
|
||||||
print(citation)
|
|
||||||
profile.append({
|
profile.append({
|
||||||
'title':title,
|
'title': title,
|
||||||
'author':author,
|
'author': author,
|
||||||
'citation':citation,
|
'citation': citation,
|
||||||
'abstract':abstract,
|
'abstract': abstract,
|
||||||
'is_paper_in_arxiv':is_paper_in_arxiv,
|
'is_paper_in_arxiv': is_paper_in_arxiv,
|
||||||
})
|
})
|
||||||
|
|
||||||
chatbot[-1] = [chatbot[-1][0], title + f'\n\n是否在arxiv中(不在arxiv中无法获取完整摘要):{is_paper_in_arxiv}\n\n' + abstract]
|
chatbot[-1] = [chatbot[-1][0], title + f'\n\n是否在arxiv中(不在arxiv中无法获取完整摘要):{is_paper_in_arxiv}\n\n' + abstract]
|
||||||
@@ -65,6 +127,7 @@ def get_meta_information(url, chatbot, history):
|
|||||||
|
|
||||||
@CatchException
|
@CatchException
|
||||||
def 谷歌检索小助手(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
|
def 谷歌检索小助手(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
|
||||||
|
disable_auto_promotion(chatbot=chatbot)
|
||||||
# 基本信息:功能、贡献者
|
# 基本信息:功能、贡献者
|
||||||
chatbot.append([
|
chatbot.append([
|
||||||
"函数插件功能?",
|
"函数插件功能?",
|
||||||
@@ -86,6 +149,9 @@ def 谷歌检索小助手(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst
|
|||||||
# 清空历史,以免输入溢出
|
# 清空历史,以免输入溢出
|
||||||
history = []
|
history = []
|
||||||
meta_paper_info_list = yield from get_meta_information(txt, chatbot, history)
|
meta_paper_info_list = yield from get_meta_information(txt, chatbot, history)
|
||||||
|
if len(meta_paper_info_list) == 0:
|
||||||
|
yield from update_ui_lastest_msg(lastmsg='获取文献失败,可能触发了google反爬虫机制。',chatbot=chatbot, history=history, delay=0)
|
||||||
|
return
|
||||||
batchsize = 5
|
batchsize = 5
|
||||||
for batch in range(math.ceil(len(meta_paper_info_list)/batchsize)):
|
for batch in range(math.ceil(len(meta_paper_info_list)/batchsize)):
|
||||||
if len(meta_paper_info_list[:batchsize]) > 0:
|
if len(meta_paper_info_list[:batchsize]) > 0:
|
||||||
@@ -107,6 +173,7 @@ def 谷歌检索小助手(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst
|
|||||||
"已经全部完成,您可以试试让AI写一个Related Works,例如您可以继续输入Write a \"Related Works\" section about \"你搜索的研究领域\" for me."])
|
"已经全部完成,您可以试试让AI写一个Related Works,例如您可以继续输入Write a \"Related Works\" section about \"你搜索的研究领域\" for me."])
|
||||||
msg = '正常'
|
msg = '正常'
|
||||||
yield from update_ui(chatbot=chatbot, history=history, msg=msg) # 刷新界面
|
yield from update_ui(chatbot=chatbot, history=history, msg=msg) # 刷新界面
|
||||||
res = write_results_to_file(history)
|
path = write_history_to_file(history)
|
||||||
chatbot.append(("完成了吗?", res));
|
promote_file_to_downloadzone(path, chatbot=chatbot)
|
||||||
|
chatbot.append(("完成了吗?", path));
|
||||||
yield from update_ui(chatbot=chatbot, history=history, msg=msg) # 刷新界面
|
yield from update_ui(chatbot=chatbot, history=history, msg=msg) # 刷新界面
|
||||||
|
|||||||
@@ -1,62 +1,2 @@
|
|||||||
# How to build | 如何构建: docker build -t gpt-academic --network=host -f Dockerfile+ChatGLM .
|
# 此Dockerfile不再维护,请前往docs/GithubAction+ChatGLM+Moss
|
||||||
# How to run | (1) 我想直接一键运行(选择0号GPU): docker run --rm -it --net=host --gpus \"device=0\" gpt-academic
|
|
||||||
# How to run | (2) 我想运行之前进容器做一些调整(选择1号GPU): docker run --rm -it --net=host --gpus \"device=1\" gpt-academic bash
|
|
||||||
|
|
||||||
# 从NVIDIA源,从而支持显卡运损(检查宿主的nvidia-smi中的cuda版本必须>=11.3)
|
|
||||||
FROM nvidia/cuda:11.3.1-runtime-ubuntu20.04
|
|
||||||
ARG useProxyNetwork=''
|
|
||||||
RUN apt-get update
|
|
||||||
RUN apt-get install -y curl proxychains curl
|
|
||||||
RUN apt-get install -y git python python3 python-dev python3-dev --fix-missing
|
|
||||||
|
|
||||||
# 配置代理网络(构建Docker镜像时使用)
|
|
||||||
# # comment out below if you do not need proxy network | 如果不需要翻墙 - 从此行向下删除
|
|
||||||
RUN $useProxyNetwork curl cip.cc
|
|
||||||
RUN sed -i '$ d' /etc/proxychains.conf
|
|
||||||
RUN sed -i '$ d' /etc/proxychains.conf
|
|
||||||
# 在这里填写主机的代理协议(用于从github拉取代码)
|
|
||||||
RUN echo "socks5 127.0.0.1 10880" >> /etc/proxychains.conf
|
|
||||||
ARG useProxyNetwork=proxychains
|
|
||||||
# # comment out above if you do not need proxy network | 如果不需要翻墙 - 从此行向上删除
|
|
||||||
|
|
||||||
|
|
||||||
# use python3 as the system default python
|
|
||||||
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.8
|
|
||||||
# 下载pytorch
|
|
||||||
RUN $useProxyNetwork python3 -m pip install torch --extra-index-url https://download.pytorch.org/whl/cu113
|
|
||||||
# 下载分支
|
|
||||||
WORKDIR /gpt
|
|
||||||
RUN $useProxyNetwork git clone https://github.com/binary-husky/gpt_academic.git
|
|
||||||
WORKDIR /gpt/gpt_academic
|
|
||||||
RUN $useProxyNetwork python3 -m pip install -r requirements.txt
|
|
||||||
RUN $useProxyNetwork python3 -m pip install -r request_llm/requirements_chatglm.txt
|
|
||||||
RUN $useProxyNetwork python3 -m pip install -r request_llm/requirements_newbing.txt
|
|
||||||
|
|
||||||
# 预热CHATGLM参数(非必要 可选步骤)
|
|
||||||
RUN echo ' \n\
|
|
||||||
from transformers import AutoModel, AutoTokenizer \n\
|
|
||||||
chatglm_tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True) \n\
|
|
||||||
chatglm_model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).float() ' >> warm_up_chatglm.py
|
|
||||||
RUN python3 -u warm_up_chatglm.py
|
|
||||||
|
|
||||||
# 禁用缓存,确保更新代码
|
|
||||||
ADD "https://www.random.org/cgi-bin/randbyte?nbytes=10&format=h" skipcache
|
|
||||||
RUN $useProxyNetwork git pull
|
|
||||||
|
|
||||||
# 预热Tiktoken模块
|
|
||||||
RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'
|
|
||||||
|
|
||||||
# 为chatgpt-academic配置代理和API-KEY (非必要 可选步骤)
|
|
||||||
# 可同时填写多个API-KEY,支持openai的key和api2d的key共存,用英文逗号分割,例如API_KEY = "sk-openaikey1,fkxxxx-api2dkey2,........"
|
|
||||||
# LLM_MODEL 是选择初始的模型
|
|
||||||
# LOCAL_MODEL_DEVICE 是选择chatglm等本地模型运行的设备,可选 cpu 和 cuda
|
|
||||||
# [说明: 以下内容与`config.py`一一对应,请查阅config.py来完成一下配置的填写]
|
|
||||||
RUN echo ' \n\
|
|
||||||
API_KEY = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,fkxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \n\
|
|
||||||
USE_PROXY = True \n\
|
|
||||||
LLM_MODEL = "chatglm" \n\
|
|
||||||
LOCAL_MODEL_DEVICE = "cuda" \n\
|
|
||||||
proxies = { "http": "socks5h://localhost:10880", "https": "socks5h://localhost:10880", } ' >> config_private.py
|
|
||||||
|
|
||||||
# 启动
|
|
||||||
CMD ["python3", "-u", "main.py"]
|
|
||||||
|
|||||||
@@ -1,59 +1 @@
|
|||||||
# How to build | 如何构建: docker build -t gpt-academic-jittor --network=host -f Dockerfile+ChatGLM .
|
# 此Dockerfile不再维护,请前往docs/GithubAction+JittorLLMs
|
||||||
# How to run | (1) 我想直接一键运行(选择0号GPU): docker run --rm -it --net=host --gpus \"device=0\" gpt-academic-jittor bash
|
|
||||||
# How to run | (2) 我想运行之前进容器做一些调整(选择1号GPU): docker run --rm -it --net=host --gpus \"device=1\" gpt-academic-jittor bash
|
|
||||||
|
|
||||||
# 从NVIDIA源,从而支持显卡运损(检查宿主的nvidia-smi中的cuda版本必须>=11.3)
|
|
||||||
FROM nvidia/cuda:11.3.1-runtime-ubuntu20.04
|
|
||||||
ARG useProxyNetwork=''
|
|
||||||
RUN apt-get update
|
|
||||||
RUN apt-get install -y curl proxychains curl g++
|
|
||||||
RUN apt-get install -y git python python3 python-dev python3-dev --fix-missing
|
|
||||||
|
|
||||||
# 配置代理网络(构建Docker镜像时使用)
|
|
||||||
# # comment out below if you do not need proxy network | 如果不需要翻墙 - 从此行向下删除
|
|
||||||
RUN $useProxyNetwork curl cip.cc
|
|
||||||
RUN sed -i '$ d' /etc/proxychains.conf
|
|
||||||
RUN sed -i '$ d' /etc/proxychains.conf
|
|
||||||
# 在这里填写主机的代理协议(用于从github拉取代码)
|
|
||||||
RUN echo "socks5 127.0.0.1 10880" >> /etc/proxychains.conf
|
|
||||||
ARG useProxyNetwork=proxychains
|
|
||||||
# # comment out above if you do not need proxy network | 如果不需要翻墙 - 从此行向上删除
|
|
||||||
|
|
||||||
|
|
||||||
# use python3 as the system default python
|
|
||||||
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.8
|
|
||||||
# 下载pytorch
|
|
||||||
RUN $useProxyNetwork python3 -m pip install torch --extra-index-url https://download.pytorch.org/whl/cu113
|
|
||||||
# 下载分支
|
|
||||||
WORKDIR /gpt
|
|
||||||
RUN $useProxyNetwork git clone https://github.com/binary-husky/gpt_academic.git
|
|
||||||
WORKDIR /gpt/gpt_academic
|
|
||||||
RUN $useProxyNetwork python3 -m pip install -r requirements.txt
|
|
||||||
RUN $useProxyNetwork python3 -m pip install -r request_llm/requirements_chatglm.txt
|
|
||||||
RUN $useProxyNetwork python3 -m pip install -r request_llm/requirements_newbing.txt
|
|
||||||
RUN $useProxyNetwork python3 -m pip install -r request_llm/requirements_jittorllms.txt -i https://pypi.jittor.org/simple -I
|
|
||||||
|
|
||||||
# 下载JittorLLMs
|
|
||||||
RUN $useProxyNetwork git clone https://github.com/binary-husky/JittorLLMs.git --depth 1 request_llm/jittorllms
|
|
||||||
|
|
||||||
# 禁用缓存,确保更新代码
|
|
||||||
ADD "https://www.random.org/cgi-bin/randbyte?nbytes=10&format=h" skipcache
|
|
||||||
RUN $useProxyNetwork git pull
|
|
||||||
|
|
||||||
# 预热Tiktoken模块
|
|
||||||
RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'
|
|
||||||
|
|
||||||
# 为chatgpt-academic配置代理和API-KEY (非必要 可选步骤)
|
|
||||||
# 可同时填写多个API-KEY,支持openai的key和api2d的key共存,用英文逗号分割,例如API_KEY = "sk-openaikey1,fkxxxx-api2dkey2,........"
|
|
||||||
# LLM_MODEL 是选择初始的模型
|
|
||||||
# LOCAL_MODEL_DEVICE 是选择chatglm等本地模型运行的设备,可选 cpu 和 cuda
|
|
||||||
# [说明: 以下内容与`config.py`一一对应,请查阅config.py来完成一下配置的填写]
|
|
||||||
RUN echo ' \n\
|
|
||||||
API_KEY = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,fkxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \n\
|
|
||||||
USE_PROXY = True \n\
|
|
||||||
LLM_MODEL = "chatglm" \n\
|
|
||||||
LOCAL_MODEL_DEVICE = "cuda" \n\
|
|
||||||
proxies = { "http": "socks5h://localhost:10880", "https": "socks5h://localhost:10880", } ' >> config_private.py
|
|
||||||
|
|
||||||
# 启动
|
|
||||||
CMD ["python3", "-u", "main.py"]
|
|
||||||
@@ -1,27 +1 @@
|
|||||||
# 此Dockerfile适用于“无本地模型”的环境构建,如果需要使用chatglm等本地模型,请参考 docs/Dockerfile+ChatGLM
|
# 此Dockerfile不再维护,请前往docs/GithubAction+NoLocal+Latex
|
||||||
# - 1 修改 `config.py`
|
|
||||||
# - 2 构建 docker build -t gpt-academic-nolocal-latex -f docs/Dockerfile+NoLocal+Latex .
|
|
||||||
# - 3 运行 docker run -v /home/fuqingxu/arxiv_cache:/root/arxiv_cache --rm -it --net=host gpt-academic-nolocal-latex
|
|
||||||
|
|
||||||
FROM fuqingxu/python311_texlive_ctex:latest
|
|
||||||
|
|
||||||
# 指定路径
|
|
||||||
WORKDIR /gpt
|
|
||||||
|
|
||||||
ARG useProxyNetwork=''
|
|
||||||
|
|
||||||
RUN $useProxyNetwork pip3 install gradio openai numpy arxiv rich -i https://pypi.douban.com/simple/
|
|
||||||
RUN $useProxyNetwork pip3 install colorama Markdown pygments pymupdf -i https://pypi.douban.com/simple/
|
|
||||||
|
|
||||||
# 装载项目文件
|
|
||||||
COPY . .
|
|
||||||
|
|
||||||
|
|
||||||
# 安装依赖
|
|
||||||
RUN $useProxyNetwork pip3 install -r requirements.txt -i https://pypi.douban.com/simple/
|
|
||||||
|
|
||||||
# 可选步骤,用于预热模块
|
|
||||||
RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'
|
|
||||||
|
|
||||||
# 启动
|
|
||||||
CMD ["python3", "-u", "main.py"]
|
|
||||||
|
|||||||
37
docs/GithubAction+AllCapacity
普通文件
37
docs/GithubAction+AllCapacity
普通文件
@@ -0,0 +1,37 @@
|
|||||||
|
# docker build -t gpt-academic-all-capacity -f docs/GithubAction+AllCapacity --network=host --build-arg http_proxy=http://localhost:10881 --build-arg https_proxy=http://localhost:10881 .
|
||||||
|
|
||||||
|
# 从NVIDIA源,从而支持显卡(检查宿主的nvidia-smi中的cuda版本必须>=11.3)
|
||||||
|
FROM fuqingxu/11.3.1-runtime-ubuntu20.04-with-texlive:latest
|
||||||
|
|
||||||
|
# use python3 as the system default python
|
||||||
|
WORKDIR /gpt
|
||||||
|
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.8
|
||||||
|
# 下载pytorch
|
||||||
|
RUN python3 -m pip install torch --extra-index-url https://download.pytorch.org/whl/cu113
|
||||||
|
# 准备pip依赖
|
||||||
|
RUN python3 -m pip install openai numpy arxiv rich
|
||||||
|
RUN python3 -m pip install colorama Markdown pygments pymupdf
|
||||||
|
RUN python3 -m pip install python-docx moviepy pdfminer
|
||||||
|
RUN python3 -m pip install zh_langchain==0.2.1
|
||||||
|
RUN python3 -m pip install nougat-ocr
|
||||||
|
RUN python3 -m pip install rarfile py7zr
|
||||||
|
RUN python3 -m pip install aliyun-python-sdk-core==2.13.3 pyOpenSSL scipy git+https://github.com/aliyun/alibabacloud-nls-python-sdk.git
|
||||||
|
# 下载分支
|
||||||
|
WORKDIR /gpt
|
||||||
|
RUN git clone --depth=1 https://github.com/binary-husky/gpt_academic.git
|
||||||
|
WORKDIR /gpt/gpt_academic
|
||||||
|
RUN git clone https://github.com/OpenLMLab/MOSS.git request_llm/moss
|
||||||
|
|
||||||
|
RUN python3 -m pip install -r requirements.txt
|
||||||
|
RUN python3 -m pip install -r request_llm/requirements_moss.txt
|
||||||
|
RUN python3 -m pip install -r request_llm/requirements_qwen.txt
|
||||||
|
RUN python3 -m pip install -r request_llm/requirements_chatglm.txt
|
||||||
|
RUN python3 -m pip install -r request_llm/requirements_newbing.txt
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
# 预热Tiktoken模块
|
||||||
|
RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'
|
||||||
|
|
||||||
|
# 启动
|
||||||
|
CMD ["python3", "-u", "main.py"]
|
||||||
@@ -1,7 +1,6 @@
|
|||||||
|
|
||||||
# 从NVIDIA源,从而支持显卡运损(检查宿主的nvidia-smi中的cuda版本必须>=11.3)
|
# 从NVIDIA源,从而支持显卡运损(检查宿主的nvidia-smi中的cuda版本必须>=11.3)
|
||||||
FROM nvidia/cuda:11.3.1-runtime-ubuntu20.04
|
FROM nvidia/cuda:11.3.1-runtime-ubuntu20.04
|
||||||
ARG useProxyNetwork=''
|
|
||||||
RUN apt-get update
|
RUN apt-get update
|
||||||
RUN apt-get install -y curl proxychains curl gcc
|
RUN apt-get install -y curl proxychains curl gcc
|
||||||
RUN apt-get install -y git python python3 python-dev python3-dev --fix-missing
|
RUN apt-get install -y git python python3 python-dev python3-dev --fix-missing
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
# 此Dockerfile适用于“无本地模型”的环境构建,如果需要使用chatglm等本地模型,请参考 docs/Dockerfile+ChatGLM
|
# 此Dockerfile适用于“无本地模型”的环境构建,如果需要使用chatglm等本地模型,请参考 docs/Dockerfile+ChatGLM
|
||||||
# - 1 修改 `config.py`
|
# - 1 修改 `config.py`
|
||||||
# - 2 构建 docker build -t gpt-academic-nolocal-latex -f docs/Dockerfile+NoLocal+Latex .
|
# - 2 构建 docker build -t gpt-academic-nolocal-latex -f docs/GithubAction+NoLocal+Latex .
|
||||||
# - 3 运行 docker run -v /home/fuqingxu/arxiv_cache:/root/arxiv_cache --rm -it --net=host gpt-academic-nolocal-latex
|
# - 3 运行 docker run -v /home/fuqingxu/arxiv_cache:/root/arxiv_cache --rm -it --net=host gpt-academic-nolocal-latex
|
||||||
|
|
||||||
FROM fuqingxu/python311_texlive_ctex:latest
|
FROM fuqingxu/python311_texlive_ctex:latest
|
||||||
@@ -10,6 +10,10 @@ WORKDIR /gpt
|
|||||||
|
|
||||||
RUN pip3 install gradio openai numpy arxiv rich
|
RUN pip3 install gradio openai numpy arxiv rich
|
||||||
RUN pip3 install colorama Markdown pygments pymupdf
|
RUN pip3 install colorama Markdown pygments pymupdf
|
||||||
|
RUN pip3 install python-docx moviepy pdfminer
|
||||||
|
RUN pip3 install zh_langchain==0.2.1
|
||||||
|
RUN pip3 install nougat-ocr
|
||||||
|
RUN pip3 install aliyun-python-sdk-core==2.13.3 pyOpenSSL scipy git+https://github.com/aliyun/alibabacloud-nls-python-sdk.git
|
||||||
|
|
||||||
# 装载项目文件
|
# 装载项目文件
|
||||||
COPY . .
|
COPY . .
|
||||||
|
|||||||
@@ -2448,5 +2448,49 @@
|
|||||||
"插件说明": "Plugin description",
|
"插件说明": "Plugin description",
|
||||||
"├── CODE_HIGHLIGHT 代码高亮": "├── CODE_HIGHLIGHT Code highlighting",
|
"├── CODE_HIGHLIGHT 代码高亮": "├── CODE_HIGHLIGHT Code highlighting",
|
||||||
"记得用插件": "Remember to use the plugin",
|
"记得用插件": "Remember to use the plugin",
|
||||||
"谨慎操作": "Handle with caution"
|
"谨慎操作": "Handle with caution",
|
||||||
|
"请检查PDF是否损坏": "#",
|
||||||
|
"执行成功了": "#",
|
||||||
|
"请在输入框内填写需求": "#",
|
||||||
|
"结果": "#",
|
||||||
|
"开始干正事": "#",
|
||||||
|
"次代码生成尝试": "#",
|
||||||
|
"代码生成结束": "#",
|
||||||
|
"Nougat解析论文失败": "#",
|
||||||
|
"受到google限制": "#",
|
||||||
|
"收尾": "#",
|
||||||
|
"结果是一个有效文件": "#",
|
||||||
|
"然后再次点击该插件": "#",
|
||||||
|
"用插件实现」": "#",
|
||||||
|
"文件路径": "#",
|
||||||
|
"仅供测试": "#",
|
||||||
|
"将csv文件转excel表格": "#",
|
||||||
|
"开始执行": "#",
|
||||||
|
"测试": "#",
|
||||||
|
"睡一会防止触发google反爬虫": "#",
|
||||||
|
"某段话的整个句子": "#",
|
||||||
|
"使用tex格式公式 测试2 给出柯西不等式": "#",
|
||||||
|
"找不到本地项目或无法处理": "#",
|
||||||
|
"交换图像的蓝色通道和红色通道": "#",
|
||||||
|
"第三步": "#",
|
||||||
|
"返回给定的url解析出的arxiv_id": "#",
|
||||||
|
"裁剪图像": "#",
|
||||||
|
"已经被记忆": "#",
|
||||||
|
"无法从bing获取信息!": "#",
|
||||||
|
"可能触发了google反爬虫机制": "#",
|
||||||
|
"检索文章的历史版本的题目": "#",
|
||||||
|
"请配置讯飞星火大模型的XFYUN_APPID": "#",
|
||||||
|
"执行失败了": "#",
|
||||||
|
"需要花费较长时间下载NOUGAT参数": "#",
|
||||||
|
"请检查": "#",
|
||||||
|
"写入": "#",
|
||||||
|
"下个句子中已经说完的部分": "#",
|
||||||
|
"精准翻译PDF文档": "#",
|
||||||
|
"解析python源代码项目": "#",
|
||||||
|
"首先在arxiv上搜索": "#",
|
||||||
|
"错误追踪": "#",
|
||||||
|
"结果是一个字符串": "#",
|
||||||
|
"由 test_on_sentence_end": "#",
|
||||||
|
"获取文章摘要": "#",
|
||||||
|
"受到bing限制": "#"
|
||||||
}
|
}
|
||||||
@@ -88,5 +88,7 @@
|
|||||||
"辅助功能": "Accessibility",
|
"辅助功能": "Accessibility",
|
||||||
"虚空终端": "VoidTerminal",
|
"虚空终端": "VoidTerminal",
|
||||||
"解析PDF_基于GROBID": "ParsePDF_BasedOnGROBID",
|
"解析PDF_基于GROBID": "ParsePDF_BasedOnGROBID",
|
||||||
"虚空终端主路由": "VoidTerminalMainRoute"
|
"虚空终端主路由": "VoidTerminalMainRoute",
|
||||||
|
"批量翻译PDF文档_NOUGAT": "BatchTranslatePDFDocuments_NOUGAT",
|
||||||
|
"解析PDF_基于NOUGAT": "ParsePDF_NOUGAT"
|
||||||
}
|
}
|
||||||
@@ -20,4 +20,4 @@ arxiv
|
|||||||
rich
|
rich
|
||||||
pypdf2==2.12.1
|
pypdf2==2.12.1
|
||||||
websocket-client
|
websocket-client
|
||||||
scipdf_parser==0.3
|
scipdf_parser>=0.3
|
||||||
|
|||||||
@@ -10,8 +10,9 @@ from tests.test_utils import plugin_test
|
|||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
# plugin_test(plugin='crazy_functions.虚空终端->虚空终端', main_input='修改api-key为sk-jhoejriotherjep')
|
# plugin_test(plugin='crazy_functions.虚空终端->虚空终端', main_input='修改api-key为sk-jhoejriotherjep')
|
||||||
|
plugin_test(plugin='crazy_functions.批量翻译PDF文档_NOUGAT->批量翻译PDF文档', main_input='crazy_functions/test_project/pdf_and_word/aaai.pdf')
|
||||||
|
|
||||||
plugin_test(plugin='crazy_functions.虚空终端->虚空终端', main_input='调用插件,对C:/Users/fuqingxu/Desktop/旧文件/gpt/chatgpt_academic/crazy_functions/latex_fns中的python文件进行解析')
|
# plugin_test(plugin='crazy_functions.虚空终端->虚空终端', main_input='调用插件,对C:/Users/fuqingxu/Desktop/旧文件/gpt/chatgpt_academic/crazy_functions/latex_fns中的python文件进行解析')
|
||||||
|
|
||||||
# plugin_test(plugin='crazy_functions.命令行助手->命令行助手', main_input='查看当前的docker容器列表')
|
# plugin_test(plugin='crazy_functions.命令行助手->命令行助手', main_input='查看当前的docker容器列表')
|
||||||
|
|
||||||
|
|||||||
46
toolbox.py
46
toolbox.py
@@ -281,8 +281,7 @@ def report_execption(chatbot, history, a, b):
|
|||||||
向chatbot中添加错误信息
|
向chatbot中添加错误信息
|
||||||
"""
|
"""
|
||||||
chatbot.append((a, b))
|
chatbot.append((a, b))
|
||||||
history.append(a)
|
history.extend([a, b])
|
||||||
history.append(b)
|
|
||||||
|
|
||||||
|
|
||||||
def text_divide_paragraph(text):
|
def text_divide_paragraph(text):
|
||||||
@@ -305,6 +304,7 @@ def text_divide_paragraph(text):
|
|||||||
text = "</br>".join(lines)
|
text = "</br>".join(lines)
|
||||||
return pre + text + suf
|
return pre + text + suf
|
||||||
|
|
||||||
|
|
||||||
@lru_cache(maxsize=128) # 使用 lru缓存 加快转换速度
|
@lru_cache(maxsize=128) # 使用 lru缓存 加快转换速度
|
||||||
def markdown_convertion(txt):
|
def markdown_convertion(txt):
|
||||||
"""
|
"""
|
||||||
@@ -359,19 +359,41 @@ def markdown_convertion(txt):
|
|||||||
content = content.replace('</script>\n</script>', '</script>')
|
content = content.replace('</script>\n</script>', '</script>')
|
||||||
return content
|
return content
|
||||||
|
|
||||||
def no_code(txt):
|
def is_equation(txt):
|
||||||
if '```' not in txt:
|
"""
|
||||||
return True
|
判定是否为公式 | 测试1 写出洛伦兹定律,使用tex格式公式 测试2 给出柯西不等式,使用latex格式 测试3 写出麦克斯韦方程组
|
||||||
|
"""
|
||||||
|
if '```' in txt and '```reference' not in txt: return False
|
||||||
|
if '$' not in txt and '\\[' not in txt: return False
|
||||||
|
mathpatterns = {
|
||||||
|
r'(?<!\\|\$)(\$)([^\$]+)(\$)': {'allow_multi_lines': False}, # $...$
|
||||||
|
r'(?<!\\)(\$\$)([^\$]+)(\$\$)': {'allow_multi_lines': True}, # $$...$$
|
||||||
|
r'(?<!\\)(\\\[)(.+?)(\\\])': {'allow_multi_lines': False}, # \[...\]
|
||||||
|
# r'(?<!\\)(\\\()(.+?)(\\\))': {'allow_multi_lines': False}, # \(...\)
|
||||||
|
# r'(?<!\\)(\\begin{([a-z]+?\*?)})(.+?)(\\end{\2})': {'allow_multi_lines': True}, # \begin...\end
|
||||||
|
# r'(?<!\\)(\$`)([^`]+)(`\$)': {'allow_multi_lines': False}, # $`...`$
|
||||||
|
}
|
||||||
|
matches = []
|
||||||
|
for pattern, property in mathpatterns.items():
|
||||||
|
flags = re.ASCII|re.DOTALL if property['allow_multi_lines'] else re.ASCII
|
||||||
|
matches.extend(re.findall(pattern, txt, flags))
|
||||||
|
if len(matches) == 0: return False
|
||||||
|
contain_any_eq = False
|
||||||
|
illegal_pattern = re.compile(r'[^\x00-\x7F]|echo')
|
||||||
|
for match in matches:
|
||||||
|
if len(match) != 3: return False
|
||||||
|
eq_canidate = match[1]
|
||||||
|
if illegal_pattern.search(eq_canidate):
|
||||||
|
return False
|
||||||
else:
|
else:
|
||||||
if '```reference' in txt: return True # newbing
|
contain_any_eq = True
|
||||||
else: return False
|
return contain_any_eq
|
||||||
|
|
||||||
if ('$' in txt) and no_code(txt): # 有$标识的公式符号,且没有代码段```的标识
|
if is_equation(txt): # 有$标识的公式符号,且没有代码段```的标识
|
||||||
# convert everything to html format
|
# convert everything to html format
|
||||||
split = markdown.markdown(text='---')
|
split = markdown.markdown(text='---')
|
||||||
convert_stage_1 = markdown.markdown(text=txt, extensions=['mdx_math', 'fenced_code', 'tables', 'sane_lists'], extension_configs=markdown_extension_configs)
|
convert_stage_1 = markdown.markdown(text=txt, extensions=['sane_lists', 'tables', 'mdx_math', 'fenced_code'], extension_configs=markdown_extension_configs)
|
||||||
convert_stage_1 = markdown_bug_hunt(convert_stage_1)
|
convert_stage_1 = markdown_bug_hunt(convert_stage_1)
|
||||||
# re.DOTALL: Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline. Corresponds to the inline flag (?s).
|
|
||||||
# 1. convert to easy-to-copy tex (do not render math)
|
# 1. convert to easy-to-copy tex (do not render math)
|
||||||
convert_stage_2_1, n = re.subn(find_equation_pattern, replace_math_no_render, convert_stage_1, flags=re.DOTALL)
|
convert_stage_2_1, n = re.subn(find_equation_pattern, replace_math_no_render, convert_stage_1, flags=re.DOTALL)
|
||||||
# 2. convert to rendered equation
|
# 2. convert to rendered equation
|
||||||
@@ -379,7 +401,7 @@ def markdown_convertion(txt):
|
|||||||
# cat them together
|
# cat them together
|
||||||
return pre + convert_stage_2_1 + f'{split}' + convert_stage_2_2 + suf
|
return pre + convert_stage_2_1 + f'{split}' + convert_stage_2_2 + suf
|
||||||
else:
|
else:
|
||||||
return pre + markdown.markdown(txt, extensions=['fenced_code', 'codehilite', 'tables', 'sane_lists']) + suf
|
return pre + markdown.markdown(txt, extensions=['sane_lists', 'tables', 'fenced_code', 'codehilite']) + suf
|
||||||
|
|
||||||
|
|
||||||
def close_up_code_segment_during_stream(gpt_reply):
|
def close_up_code_segment_during_stream(gpt_reply):
|
||||||
@@ -561,7 +583,7 @@ def on_file_uploaded(files, chatbot, txt, txt2, checkboxes, cookies):
|
|||||||
chatbot.append(['我上传了文件,请查收',
|
chatbot.append(['我上传了文件,请查收',
|
||||||
f'[Local Message] 收到以下文件: \n\n{moved_files_str}' +
|
f'[Local Message] 收到以下文件: \n\n{moved_files_str}' +
|
||||||
f'\n\n调用路径参数已自动修正到: \n\n{txt}' +
|
f'\n\n调用路径参数已自动修正到: \n\n{txt}' +
|
||||||
f'\n\n现在您点击任意“红颜色”标识的函数插件时,以上文件将被作为输入参数'+err_msg])
|
f'\n\n现在您点击任意函数插件时,以上文件将被作为输入参数'+err_msg])
|
||||||
cookies.update({
|
cookies.update({
|
||||||
'most_recent_uploaded': {
|
'most_recent_uploaded': {
|
||||||
'path': f'private_upload/{time_tag}',
|
'path': f'private_upload/{time_tag}',
|
||||||
|
|||||||
在新工单中引用
屏蔽一个用户