比较提交

..

18 次代码提交

作者 SHA1 备注 提交日期
binary-husky
7415d532d1 solve the pdf concate error 2024-10-13 07:36:36 +00:00
binary-husky
97eef45ab7 Merge branch 'frontier' of github.com:binary-husky/chatgpt_academic into frontier 2024-10-01 11:59:14 +00:00
binary-husky
0c0e2acb9b remove logging extra 2024-10-01 11:57:47 +00:00
Ren Lifei
9fba8e0142 Added some modules to support openrouter (#1975)
* Added some modules for supporting openrouter model

Added some modules for supporting openrouter model

* Update config.py

* Update .gitignore

* Update bridge_openrouter.py

* Not changed actually

* Refactor logging in bridge_openrouter.py

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>
2024-09-28 18:05:34 +08:00
binary-husky
7d7867fb64 remove comment 2024-09-23 15:16:13 +00:00
binary-husky
f9dbaa39fb Merge branch 'frontier' of github.com:binary-husky/chatgpt_academic into frontier 2024-09-21 15:40:24 +00:00
binary-husky
bbc2288c5b relax llama index version 2024-09-21 15:40:10 +00:00
Steven Moder
64ab916838 fix: loguru argument error with proxy enabled (#1977) 2024-09-21 23:32:00 +08:00
binary-husky
8fe559da9f update translation matrix 2024-09-21 14:56:10 +00:00
binary-husky
09fd22091a fix: console output 2024-09-21 14:41:36 +00:00
binary-husky
e296719b23 Merge branch 'purge_print' into frontier 2024-09-16 09:56:25 +00:00
binary-husky
2f343179a2 logging -> loguru: final stage 2024-09-15 15:51:51 +00:00
binary-husky
4d9604f2e9 update social helper 2024-09-15 15:16:36 +00:00
binary-husky
bbf9e9f868 logging -> loguru stage 4 2024-09-14 16:00:09 +00:00
binary-husky
aa1f967dd7 support o1-preview and o1-mini 2024-09-13 03:11:53 +00:00
binary-husky
0d082327c8 logging -> loguru: stage 3 2024-09-11 08:49:55 +00:00
binary-husky
80acd9c875 import loguru: stage 2 2024-09-11 08:18:01 +00:00
binary-husky
17cd4f8210 logging sys to loguru: stage 1 complete 2024-09-11 03:30:30 +00:00
共有 41 个文件被更改,包括 399 次插入913 次删除

查看文件

@@ -1,14 +1,14 @@
# https://docs.github.com/en/actions/publishing-packages/publishing-docker-images#publishing-images-to-github-packages # https://docs.github.com/en/actions/publishing-packages/publishing-docker-images#publishing-images-to-github-packages
name: build-with-latex-arm name: build-with-all-capacity-beta
on: on:
push: push:
branches: branches:
- "master" - 'master'
env: env:
REGISTRY: ghcr.io REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}_with_latex_arm IMAGE_NAME: ${{ github.repository }}_with_all_capacity_beta
jobs: jobs:
build-and-push-image: build-and-push-image:
@@ -18,17 +18,11 @@ jobs:
packages: write packages: write
steps: steps:
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@v4 uses: actions/checkout@v3
- name: Log in to the Container registry - name: Log in to the Container registry
uses: docker/login-action@v3 uses: docker/login-action@v2
with: with:
registry: ${{ env.REGISTRY }} registry: ${{ env.REGISTRY }}
username: ${{ github.actor }} username: ${{ github.actor }}
@@ -41,11 +35,10 @@ jobs:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }} images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
- name: Build and push Docker image - name: Build and push Docker image
uses: docker/build-push-action@v6 uses: docker/build-push-action@v4
with: with:
context: . context: .
push: true push: true
platforms: linux/arm64 file: docs/GithubAction+AllCapacityBeta
file: docs/GithubAction+NoLocal+Latex
tags: ${{ steps.meta.outputs.tags }} tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }} labels: ${{ steps.meta.outputs.labels }}

查看文件

@@ -0,0 +1,44 @@
# https://docs.github.com/en/actions/publishing-packages/publishing-docker-images#publishing-images-to-github-packages
name: build-with-jittorllms
on:
push:
branches:
- 'master'
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}_jittorllms
jobs:
build-and-push-image:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Log in to the Container registry
uses: docker/login-action@v2
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v4
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
- name: Build and push Docker image
uses: docker/build-push-action@v4
with:
context: .
push: true
file: docs/GithubAction+JittorLLMs
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}

2
.gitignore vendored
查看文件

@@ -161,5 +161,3 @@ temp.*
objdump* objdump*
*.min.*.js *.min.*.js
TODO TODO
experimental_mods
search_results

查看文件

@@ -1,6 +1,5 @@
> [!IMPORTANT] > [!IMPORTANT]
> 2024.10.10: 突发停电,紧急恢复了提供[whl包](https://drive.google.com/file/d/19U_hsLoMrjOlQSzYS3pzWX9fTzyusArP/view?usp=sharing)的文件服务器 > 2024.6.1: 版本3.80加入插件二级菜单功能详见wiki
> 2024.10.8: 版本3.90加入对llama-index的初步支持,版本3.80加入插件二级菜单功能详见wiki
> 2024.5.1: 加入Doc2x翻译PDF论文的功能,[查看详情](https://github.com/binary-husky/gpt_academic/wiki/Doc2x) > 2024.5.1: 加入Doc2x翻译PDF论文的功能,[查看详情](https://github.com/binary-husky/gpt_academic/wiki/Doc2x)
> 2024.3.11: 全力支持Qwen、GLM、DeepseekCoder等中文大语言模型 SoVits语音克隆模块,[查看详情](https://www.bilibili.com/video/BV1Rp421S7tF/) > 2024.3.11: 全力支持Qwen、GLM、DeepseekCoder等中文大语言模型 SoVits语音克隆模块,[查看详情](https://www.bilibili.com/video/BV1Rp421S7tF/)
> 2024.1.17: 安装依赖时,请选择`requirements.txt`中**指定的版本**。 安装命令:`pip install -r requirements.txt`。本项目完全开源免费,您可通过订阅[在线服务](https://github.com/binary-husky/gpt_academic/wiki/online)的方式鼓励本项目的发展。 > 2024.1.17: 安装依赖时,请选择`requirements.txt`中**指定的版本**。 安装命令:`pip install -r requirements.txt`。本项目完全开源免费,您可通过订阅[在线服务](https://github.com/binary-husky/gpt_academic/wiki/online)的方式鼓励本项目的发展。

查看文件

@@ -1,36 +1,24 @@
from loguru import logger from loguru import logger
def check_proxy(proxies, return_ip=False): def check_proxy(proxies, return_ip=False):
"""
检查代理配置并返回结果。
Args:
proxies (dict): 包含http和https代理配置的字典。
return_ip (bool, optional): 是否返回代理的IP地址。默认为False。
Returns:
str or None: 检查的结果信息或代理的IP地址如果`return_ip`为True
"""
import requests import requests
proxies_https = proxies['https'] if proxies is not None else '' proxies_https = proxies['https'] if proxies is not None else ''
ip = None ip = None
try: try:
response = requests.get("https://ipapi.co/json/", proxies=proxies, timeout=4) # ⭐ 执行GET请求以获取代理信息 response = requests.get("https://ipapi.co/json/", proxies=proxies, timeout=4)
data = response.json() data = response.json()
if 'country_name' in data: if 'country_name' in data:
country = data['country_name'] country = data['country_name']
result = f"代理配置 {proxies_https}, 代理所在地:{country}" result = f"代理配置 {proxies_https}, 代理所在地:{country}"
if 'ip' in data: if 'ip' in data: ip = data['ip']
ip = data['ip']
elif 'error' in data: elif 'error' in data:
alternative, ip = _check_with_backup_source(proxies) # ⭐ 调用备用方法检查代理配置 alternative, ip = _check_with_backup_source(proxies)
if alternative is None: if alternative is None:
result = f"代理配置 {proxies_https}, 代理所在地未知,IP查询频率受限" result = f"代理配置 {proxies_https}, 代理所在地未知,IP查询频率受限"
else: else:
result = f"代理配置 {proxies_https}, 代理所在地:{alternative}" result = f"代理配置 {proxies_https}, 代理所在地:{alternative}"
else: else:
result = f"代理配置 {proxies_https}, 代理数据解析失败:{data}" result = f"代理配置 {proxies_https}, 代理数据解析失败:{data}"
if not return_ip: if not return_ip:
logger.warning(result) logger.warning(result)
return result return result
@@ -45,33 +33,17 @@ def check_proxy(proxies, return_ip=False):
return ip return ip
def _check_with_backup_source(proxies): def _check_with_backup_source(proxies):
"""
通过备份源检查代理,并获取相应信息。
Args:
proxies (dict): 包含代理信息的字典。
Returns:
tuple: 代理信息(geo)和IP地址(ip)的元组。
"""
import random, string, requests import random, string, requests
random_string = ''.join(random.choices(string.ascii_letters + string.digits, k=32)) random_string = ''.join(random.choices(string.ascii_letters + string.digits, k=32))
try: try:
res_json = requests.get(f"http://{random_string}.edns.ip-api.com/json", proxies=proxies, timeout=4).json() # ⭐ 执行代理检查和备份源请求 res_json = requests.get(f"http://{random_string}.edns.ip-api.com/json", proxies=proxies, timeout=4).json()
return res_json['dns']['geo'], res_json['dns']['ip'] return res_json['dns']['geo'], res_json['dns']['ip']
except: except:
return None, None return None, None
def backup_and_download(current_version, remote_version): def backup_and_download(current_version, remote_version):
""" """
一键更新协议:备份当前版本,下载远程版本并解压缩。 一键更新协议:备份和下载
Args:
current_version (str): 当前版本号。
remote_version (str): 远程版本号。
Returns:
str: 新版本目录的路径。
""" """
from toolbox import get_conf from toolbox import get_conf
import shutil import shutil
@@ -88,7 +60,7 @@ def backup_and_download(current_version, remote_version):
proxies = get_conf('proxies') proxies = get_conf('proxies')
try: r = requests.get('https://github.com/binary-husky/chatgpt_academic/archive/refs/heads/master.zip', proxies=proxies, stream=True) try: r = requests.get('https://github.com/binary-husky/chatgpt_academic/archive/refs/heads/master.zip', proxies=proxies, stream=True)
except: r = requests.get('https://public.agent-matrix.com/publish/master.zip', proxies=proxies, stream=True) except: r = requests.get('https://public.agent-matrix.com/publish/master.zip', proxies=proxies, stream=True)
zip_file_path = backup_dir+'/master.zip' # ⭐ 保存备份文件的路径 zip_file_path = backup_dir+'/master.zip'
with open(zip_file_path, 'wb+') as f: with open(zip_file_path, 'wb+') as f:
f.write(r.content) f.write(r.content)
dst_path = new_version_dir dst_path = new_version_dir
@@ -104,17 +76,6 @@ def backup_and_download(current_version, remote_version):
def patch_and_restart(path): def patch_and_restart(path):
""" """
一键更新协议:覆盖和重启 一键更新协议:覆盖和重启
Args:
path (str): 新版本代码所在的路径
注意事项:
如果您的程序没有使用config_private.py私密配置文件,则会将config.py重命名为config_private.py以避免配置丢失。
更新流程:
- 复制最新版本代码到当前目录
- 更新pip包依赖
- 如果更新失败,则提示手动安装依赖库并重启
""" """
from distutils import dir_util from distutils import dir_util
import shutil import shutil
@@ -123,43 +84,32 @@ def patch_and_restart(path):
import time import time
import glob import glob
from shared_utils.colorful import log亮黄, log亮绿, log亮红 from shared_utils.colorful import log亮黄, log亮绿, log亮红
# if not using config_private, move origin config.py as config_private.py
if not os.path.exists('config_private.py'): if not os.path.exists('config_private.py'):
log亮黄('由于您没有设置config_private.py私密配置,现将您的现有配置移动至config_private.py以防止配置丢失,', log亮黄('由于您没有设置config_private.py私密配置,现将您的现有配置移动至config_private.py以防止配置丢失,',
'另外您可以随时在history子文件夹下找回旧版的程序。') '另外您可以随时在history子文件夹下找回旧版的程序。')
shutil.copyfile('config.py', 'config_private.py') shutil.copyfile('config.py', 'config_private.py')
path_new_version = glob.glob(path + '/*-master')[0] path_new_version = glob.glob(path + '/*-master')[0]
dir_util.copy_tree(path_new_version, './') # ⭐ 将最新版本代码复制到当前目录 dir_util.copy_tree(path_new_version, './')
log亮绿('代码已经更新,即将更新pip包依赖……') log亮绿('代码已经更新,即将更新pip包依赖……')
for i in reversed(range(5)): time.sleep(1); log亮绿(i) for i in reversed(range(5)): time.sleep(1); log亮绿(i)
try: try:
import subprocess import subprocess
subprocess.check_call([sys.executable, '-m', 'pip', 'install', '-r', 'requirements.txt']) subprocess.check_call([sys.executable, '-m', 'pip', 'install', '-r', 'requirements.txt'])
except: except:
log亮红('pip包依赖安装出现问题,需要手动安装新增的依赖库 `python -m pip install -r requirements.txt`,然后在用常规的`python main.py`的方式启动。') log亮红('pip包依赖安装出现问题,需要手动安装新增的依赖库 `python -m pip install -r requirements.txt`,然后在用常规的`python main.py`的方式启动。')
log亮绿('更新完成,您可以随时在history子文件夹下找回旧版的程序,5s之后重启') log亮绿('更新完成,您可以随时在history子文件夹下找回旧版的程序,5s之后重启')
log亮红('假如重启失败,您可能需要手动安装新增的依赖库 `python -m pip install -r requirements.txt`,然后在用常规的`python main.py`的方式启动。') log亮红('假如重启失败,您可能需要手动安装新增的依赖库 `python -m pip install -r requirements.txt`,然后在用常规的`python main.py`的方式启动。')
log亮绿(' ------------------------------ -----------------------------------') log亮绿(' ------------------------------ -----------------------------------')
for i in reversed(range(8)): time.sleep(1); log亮绿(i) for i in reversed(range(8)): time.sleep(1); log亮绿(i)
os.execl(sys.executable, sys.executable, *sys.argv) # 重启程序 os.execl(sys.executable, sys.executable, *sys.argv)
def get_current_version(): def get_current_version():
"""
获取当前的版本号。
Returns:
str: 当前的版本号。如果无法获取版本号,则返回空字符串。
"""
import json import json
try: try:
with open('./version', 'r', encoding='utf8') as f: with open('./version', 'r', encoding='utf8') as f:
current_version = json.loads(f.read())['version'] # ⭐ 从读取的json数据中提取版本号 current_version = json.loads(f.read())['version']
except: except:
current_version = "" current_version = ""
return current_version return current_version
@@ -168,12 +118,6 @@ def get_current_version():
def auto_update(raise_error=False): def auto_update(raise_error=False):
""" """
一键更新协议:查询版本和用户意见 一键更新协议:查询版本和用户意见
Args:
raise_error (bool, optional): 是否在出错时抛出错误。默认为 False。
Returns:
None
""" """
try: try:
from toolbox import get_conf from toolbox import get_conf
@@ -193,13 +137,13 @@ def auto_update(raise_error=False):
current_version = json.loads(current_version)['version'] current_version = json.loads(current_version)['version']
if (remote_version - current_version) >= 0.01-1e-5: if (remote_version - current_version) >= 0.01-1e-5:
from shared_utils.colorful import log亮黄 from shared_utils.colorful import log亮黄
log亮黄(f'\n新版本可用。新版本:{remote_version},当前版本:{current_version}{new_feature}') # ⭐ 在控制台打印新版本信息 log亮黄(f'\n新版本可用。新版本:{remote_version},当前版本:{current_version}{new_feature}')
logger.info('1Github更新地址:\nhttps://github.com/binary-husky/chatgpt_academic\n') logger.info('1Github更新地址:\nhttps://github.com/binary-husky/chatgpt_academic\n')
user_instruction = input('2是否一键更新代码Y+回车=确认,输入其他/无输入+回车=不更新)?') user_instruction = input('2是否一键更新代码Y+回车=确认,输入其他/无输入+回车=不更新)?')
if user_instruction in ['Y', 'y']: if user_instruction in ['Y', 'y']:
path = backup_and_download(current_version, remote_version) # ⭐ 备份并下载文件 path = backup_and_download(current_version, remote_version)
try: try:
patch_and_restart(path) # ⭐ 执行覆盖并重启操作 patch_and_restart(path)
except: except:
msg = '更新失败。' msg = '更新失败。'
if raise_error: if raise_error:
@@ -219,9 +163,6 @@ def auto_update(raise_error=False):
logger.info(msg) logger.info(msg)
def warm_up_modules(): def warm_up_modules():
"""
预热模块,加载特定模块并执行预热操作。
"""
logger.info('正在执行一些模块的预热 ...') logger.info('正在执行一些模块的预热 ...')
from toolbox import ProxyNetworkActivate from toolbox import ProxyNetworkActivate
from request_llms.bridge_all import model_info from request_llms.bridge_all import model_info
@@ -232,16 +173,6 @@ def warm_up_modules():
enc.encode("模块预热", disallowed_special=()) enc.encode("模块预热", disallowed_special=())
def warm_up_vectordb(): def warm_up_vectordb():
"""
执行一些模块的预热操作。
本函数主要用于执行一些模块的预热操作,确保在后续的流程中能够顺利运行。
⭐ 关键作用:预热模块
Returns:
None
"""
logger.info('正在执行一些模块的预热 ...') logger.info('正在执行一些模块的预热 ...')
from toolbox import ProxyNetworkActivate from toolbox import ProxyNetworkActivate
with ProxyNetworkActivate("Warmup_Modules"): with ProxyNetworkActivate("Warmup_Modules"):

查看文件

@@ -6,6 +6,7 @@ from loguru import logger
def get_crazy_functions(): def get_crazy_functions():
from crazy_functions.读文章写摘要 import 读文章写摘要 from crazy_functions.读文章写摘要 import 读文章写摘要
from crazy_functions.生成函数注释 import 批量生成函数注释 from crazy_functions.生成函数注释 import 批量生成函数注释
from crazy_functions.Rag_Interface import Rag问答
from crazy_functions.SourceCode_Analyse import 解析项目本身 from crazy_functions.SourceCode_Analyse import 解析项目本身
from crazy_functions.SourceCode_Analyse import 解析一个Python项目 from crazy_functions.SourceCode_Analyse import 解析一个Python项目
from crazy_functions.SourceCode_Analyse import 解析一个Matlab项目 from crazy_functions.SourceCode_Analyse import 解析一个Matlab项目
@@ -49,9 +50,15 @@ def get_crazy_functions():
from crazy_functions.Image_Generate import 图片生成_DALLE2, 图片生成_DALLE3, 图片修改_DALLE2 from crazy_functions.Image_Generate import 图片生成_DALLE2, 图片生成_DALLE3, 图片修改_DALLE2
from crazy_functions.Image_Generate_Wrap import ImageGen_Wrap from crazy_functions.Image_Generate_Wrap import ImageGen_Wrap
from crazy_functions.SourceCode_Comment import 注释Python项目 from crazy_functions.SourceCode_Comment import 注释Python项目
from crazy_functions.SourceCode_Comment_Wrap import SourceCodeComment_Wrap
function_plugins = { function_plugins = {
"Rag智能召回": {
"Group": "对话",
"Color": "stop",
"AsButton": False,
"Info": "将问答数据记录到向量库中,作为长期参考。",
"Function": HotReload(Rag问答),
},
"虚空终端": { "虚空终端": {
"Group": "对话|编程|学术|智能体", "Group": "对话|编程|学术|智能体",
"Color": "stop", "Color": "stop",
@@ -72,7 +79,6 @@ def get_crazy_functions():
"AsButton": False, "AsButton": False,
"Info": "上传一系列python源文件(或者压缩包), 为这些代码添加docstring | 输入参数为路径", "Info": "上传一系列python源文件(或者压缩包), 为这些代码添加docstring | 输入参数为路径",
"Function": HotReload(注释Python项目), "Function": HotReload(注释Python项目),
"Class": SourceCodeComment_Wrap,
}, },
"载入对话历史存档(先上传存档或输入路径)": { "载入对话历史存档(先上传存档或输入路径)": {
"Group": "对话", "Group": "对话",
@@ -701,31 +707,6 @@ def get_crazy_functions():
logger.error(trimmed_format_exc()) logger.error(trimmed_format_exc())
logger.error("Load function plugin failed") logger.error("Load function plugin failed")
try:
from crazy_functions.Rag_Interface import Rag问答
function_plugins.update(
{
"Rag智能召回": {
"Group": "对话",
"Color": "stop",
"AsButton": False,
"Info": "将问答数据记录到向量库中,作为长期参考。",
"Function": HotReload(Rag问答),
},
}
)
except:
logger.error(trimmed_format_exc())
logger.error("Load function plugin failed")
# try: # try:
# from crazy_functions.高级功能函数模板 import 测试图表渲染 # from crazy_functions.高级功能函数模板 import 测试图表渲染
# function_plugins.update({ # function_plugins.update({

查看文件

@@ -3,7 +3,7 @@ from toolbox import CatchException, report_exception, update_ui_lastest_msg, zip
from functools import partial from functools import partial
from loguru import logger from loguru import logger
import glob, os, requests, time, json, tarfile, threading import glob, os, requests, time, json, tarfile
pj = os.path.join pj = os.path.join
ARXIV_CACHE_DIR = get_conf("ARXIV_CACHE_DIR") ARXIV_CACHE_DIR = get_conf("ARXIV_CACHE_DIR")
@@ -138,43 +138,25 @@ def arxiv_download(chatbot, history, txt, allow_cache=True):
cached_translation_pdf = check_cached_translation_pdf(arxiv_id) cached_translation_pdf = check_cached_translation_pdf(arxiv_id)
if cached_translation_pdf and allow_cache: return cached_translation_pdf, arxiv_id if cached_translation_pdf and allow_cache: return cached_translation_pdf, arxiv_id
extract_dst = pj(ARXIV_CACHE_DIR, arxiv_id, 'extract') url_tar = url_.replace('/abs/', '/e-print/')
translation_dir = pj(ARXIV_CACHE_DIR, arxiv_id, 'e-print') translation_dir = pj(ARXIV_CACHE_DIR, arxiv_id, 'e-print')
dst = pj(translation_dir, arxiv_id + '.tar') extract_dst = pj(ARXIV_CACHE_DIR, arxiv_id, 'extract')
os.makedirs(translation_dir, exist_ok=True) os.makedirs(translation_dir, exist_ok=True)
# <-------------- download arxiv source file -------------> # <-------------- download arxiv source file ------------->
dst = pj(translation_dir, arxiv_id + '.tar')
def fix_url_and_download(): if os.path.exists(dst):
# for url_tar in [url_.replace('/abs/', '/e-print/'), url_.replace('/abs/', '/src/')]: yield from update_ui_lastest_msg("调用缓存", chatbot=chatbot, history=history) # 刷新界面
for url_tar in [url_.replace('/abs/', '/src/'), url_.replace('/abs/', '/e-print/')]:
proxies = get_conf('proxies')
r = requests.get(url_tar, proxies=proxies)
if r.status_code == 200:
with open(dst, 'wb+') as f:
f.write(r.content)
return True
return False
if os.path.exists(dst) and allow_cache:
yield from update_ui_lastest_msg(f"调用缓存 {arxiv_id}", chatbot=chatbot, history=history) # 刷新界面
success = True
else: else:
yield from update_ui_lastest_msg(f"开始下载 {arxiv_id}", chatbot=chatbot, history=history) # 刷新界面 yield from update_ui_lastest_msg("开始下载", chatbot=chatbot, history=history) # 刷新界面
success = fix_url_and_download() proxies = get_conf('proxies')
yield from update_ui_lastest_msg(f"下载完成 {arxiv_id}", chatbot=chatbot, history=history) # 刷新界面 r = requests.get(url_tar, proxies=proxies)
with open(dst, 'wb+') as f:
f.write(r.content)
if not success:
yield from update_ui_lastest_msg(f"下载失败 {arxiv_id}", chatbot=chatbot, history=history)
raise tarfile.ReadError(f"论文下载失败 {arxiv_id}")
# <-------------- extract file -------------> # <-------------- extract file ------------->
yield from update_ui_lastest_msg("下载完成", chatbot=chatbot, history=history) # 刷新界面
from toolbox import extract_archive from toolbox import extract_archive
try: extract_archive(file_path=dst, dest_dir=extract_dst)
extract_archive(file_path=dst, dest_dir=extract_dst)
except tarfile.ReadError:
os.remove(dst)
raise tarfile.ReadError(f"论文下载失败")
return extract_dst, arxiv_id return extract_dst, arxiv_id
@@ -338,17 +320,11 @@ def Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot,
# <-------------- more requirements -------------> # <-------------- more requirements ------------->
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg") if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
more_req = plugin_kwargs.get("advanced_arg", "") more_req = plugin_kwargs.get("advanced_arg", "")
no_cache = more_req.startswith("--no-cache")
no_cache = ("--no-cache" in more_req) if no_cache: more_req.lstrip("--no-cache")
if no_cache: more_req = more_req.replace("--no-cache", "").strip()
allow_gptac_cloud_io = ("--allow-cloudio" in more_req) # 从云端下载翻译结果,以及上传翻译结果到云端
if allow_gptac_cloud_io: more_req = more_req.replace("--allow-cloudio", "").strip()
allow_cache = not no_cache allow_cache = not no_cache
_switch_prompt_ = partial(switch_prompt, more_requirement=more_req) _switch_prompt_ = partial(switch_prompt, more_requirement=more_req)
# <-------------- check deps -------------> # <-------------- check deps ------------->
try: try:
import glob, os, time, subprocess import glob, os, time, subprocess
@@ -375,20 +351,6 @@ def Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot,
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return return
# #################################################################
if allow_gptac_cloud_io and arxiv_id:
# 访问 GPTAC学术云,查询云端是否存在该论文的翻译版本
from crazy_functions.latex_fns.latex_actions import check_gptac_cloud
success, downloaded = check_gptac_cloud(arxiv_id, chatbot)
if success:
chatbot.append([
f"检测到GPTAC云端存在翻译版本, 如果不满意翻译结果, 请禁用云端分享, 然后重新执行。",
None
])
yield from update_ui(chatbot=chatbot, history=history)
return
#################################################################
if os.path.exists(txt): if os.path.exists(txt):
project_folder = txt project_folder = txt
else: else:
@@ -426,21 +388,14 @@ def Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot,
# <-------------- zip PDF -------------> # <-------------- zip PDF ------------->
zip_res = zip_result(project_folder) zip_res = zip_result(project_folder)
if success: if success:
if allow_gptac_cloud_io and arxiv_id:
# 如果用户允许,我们将翻译好的arxiv论文PDF上传到GPTAC学术云
from crazy_functions.latex_fns.latex_actions import upload_to_gptac_cloud_if_user_allow
threading.Thread(target=upload_to_gptac_cloud_if_user_allow,
args=(chatbot, arxiv_id), daemon=True).start()
chatbot.append((f"成功啦", '请查收结果(压缩包)...')) chatbot.append((f"成功啦", '请查收结果(压缩包)...'))
yield from update_ui(chatbot=chatbot, history=history) yield from update_ui(chatbot=chatbot, history=history);
time.sleep(1) # 刷新界面 time.sleep(1) # 刷新界面
promote_file_to_downloadzone(file=zip_res, chatbot=chatbot) promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
else: else:
chatbot.append((f"失败了", chatbot.append((f"失败了",
'虽然PDF生成失败了, 但请查收结果(压缩包), 内含已经翻译的Tex文档, 您可以到Github Issue区, 用该压缩包进行反馈。如系统是Linux,请检查系统字体见Github wiki ...')) '虽然PDF生成失败了, 但请查收结果(压缩包), 内含已经翻译的Tex文档, 您可以到Github Issue区, 用该压缩包进行反馈。如系统是Linux,请检查系统字体见Github wiki ...'))
yield from update_ui(chatbot=chatbot, history=history) yield from update_ui(chatbot=chatbot, history=history);
time.sleep(1) # 刷新界面 time.sleep(1) # 刷新界面
promote_file_to_downloadzone(file=zip_res, chatbot=chatbot) promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)

查看文件

@@ -30,8 +30,6 @@ class Arxiv_Localize(GptAcademicPluginTemplate):
default_value="", type="string").model_dump_json(), # 高级参数输入区,自动同步 default_value="", type="string").model_dump_json(), # 高级参数输入区,自动同步
"allow_cache": "allow_cache":
ArgProperty(title="是否允许从缓存中调取结果", options=["允许缓存", "从头执行"], default_value="允许缓存", description="", type="dropdown").model_dump_json(), ArgProperty(title="是否允许从缓存中调取结果", options=["允许缓存", "从头执行"], default_value="允许缓存", description="", type="dropdown").model_dump_json(),
"allow_cloudio":
ArgProperty(title="是否允许从GPTAC学术云下载(或者上传)翻译结果(仅针对Arxiv论文)", options=["允许", "禁止"], default_value="禁止", description="共享文献,互助互利", type="dropdown").model_dump_json(),
} }
return gui_definition return gui_definition
@@ -40,14 +38,9 @@ class Arxiv_Localize(GptAcademicPluginTemplate):
执行插件 执行插件
""" """
allow_cache = plugin_kwargs["allow_cache"] allow_cache = plugin_kwargs["allow_cache"]
allow_cloudio = plugin_kwargs["allow_cloudio"]
advanced_arg = plugin_kwargs["advanced_arg"] advanced_arg = plugin_kwargs["advanced_arg"]
if allow_cache == "从头执行": plugin_kwargs["advanced_arg"] = "--no-cache " + plugin_kwargs["advanced_arg"] if allow_cache == "从头执行": plugin_kwargs["advanced_arg"] = "--no-cache " + plugin_kwargs["advanced_arg"]
# 从云端下载翻译结果,以及上传翻译结果到云端;人人为我,我为人人。
if allow_cloudio == "允许": plugin_kwargs["advanced_arg"] = "--allow-cloudio " + plugin_kwargs["advanced_arg"]
yield from Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request) yield from Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request)

查看文件

@@ -65,7 +65,7 @@ def 多文件翻译(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
pfg.file_contents.append(file_content) pfg.file_contents.append(file_content)
# <-------- 拆分过长的Markdown文件 ----------> # <-------- 拆分过长的Markdown文件 ---------->
pfg.run_file_split(max_token_limit=1024) pfg.run_file_split(max_token_limit=2048)
n_split = len(pfg.sp_file_contents) n_split = len(pfg.sp_file_contents)
# <-------- 多线程翻译开始 ----------> # <-------- 多线程翻译开始 ---------->

查看文件

@@ -2,7 +2,20 @@ from toolbox import CatchException, update_ui, get_conf, get_log_folder, update_
from crazy_functions.crazy_utils import input_clipping from crazy_functions.crazy_utils import input_clipping
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
VECTOR_STORE_TYPE = "Milvus"
if VECTOR_STORE_TYPE == "Milvus":
try:
from crazy_functions.rag_fns.milvus_worker import MilvusRagWorker as LlamaIndexRagWorker
except:
VECTOR_STORE_TYPE = "Simple"
if VECTOR_STORE_TYPE == "Simple":
from crazy_functions.rag_fns.llama_index_worker import LlamaIndexRagWorker
RAG_WORKER_REGISTER = {} RAG_WORKER_REGISTER = {}
MAX_HISTORY_ROUND = 5 MAX_HISTORY_ROUND = 5
MAX_CONTEXT_TOKEN_LIMIT = 4096 MAX_CONTEXT_TOKEN_LIMIT = 4096
REMEMBER_PREVIEW = 1000 REMEMBER_PREVIEW = 1000
@@ -10,16 +23,6 @@ REMEMBER_PREVIEW = 1000
@CatchException @CatchException
def Rag问答(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request): def Rag问答(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
# import vector store lib
VECTOR_STORE_TYPE = "Milvus"
if VECTOR_STORE_TYPE == "Milvus":
try:
from crazy_functions.rag_fns.milvus_worker import MilvusRagWorker as LlamaIndexRagWorker
except:
VECTOR_STORE_TYPE = "Simple"
if VECTOR_STORE_TYPE == "Simple":
from crazy_functions.rag_fns.llama_index_worker import LlamaIndexRagWorker
# 1. we retrieve rag worker from global context # 1. we retrieve rag worker from global context
user_name = chatbot.get_user() user_name = chatbot.get_user()
checkpoint_dir = get_log_folder(user_name, plugin_name='experimental_rag') checkpoint_dir = get_log_folder(user_name, plugin_name='experimental_rag')

查看文件

@@ -6,10 +6,7 @@ from crazy_functions.crazy_utils import request_gpt_model_multi_threads_with_ver
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from crazy_functions.agent_fns.python_comment_agent import PythonCodeComment from crazy_functions.agent_fns.python_comment_agent import PythonCodeComment
from crazy_functions.diagram_fns.file_tree import FileNode from crazy_functions.diagram_fns.file_tree import FileNode
from crazy_functions.agent_fns.watchdog import WatchDog
from shared_utils.advanced_markdown_format import markdown_convertion_for_file from shared_utils.advanced_markdown_format import markdown_convertion_for_file
from loguru import logger
def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt): def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
@@ -27,13 +24,12 @@ def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
file_tree_struct.add_file(file_path, file_path) file_tree_struct.add_file(file_path, file_path)
# <第一步,逐个文件分析,多线程> # <第一步,逐个文件分析,多线程>
lang = "" if not plugin_kwargs["use_chinese"] else " (you must use Chinese)"
for index, fp in enumerate(file_manifest): for index, fp in enumerate(file_manifest):
# 读取文件 # 读取文件
with open(fp, 'r', encoding='utf-8', errors='replace') as f: with open(fp, 'r', encoding='utf-8', errors='replace') as f:
file_content = f.read() file_content = f.read()
prefix = "" prefix = ""
i_say = prefix + f'Please conclude the following source code at {os.path.relpath(fp, project_folder)} with only one sentence{lang}, the code is:\n```{file_content}```' i_say = prefix + f'Please conclude the following source code at {os.path.relpath(fp, project_folder)} with only one sentence, the code is:\n```{file_content}```'
i_say_show_user = prefix + f'[{index+1}/{len(file_manifest)}] 请用一句话对下面的程序文件做一个整体概述: {fp}' i_say_show_user = prefix + f'[{index+1}/{len(file_manifest)}] 请用一句话对下面的程序文件做一个整体概述: {fp}'
# 装载请求内容 # 装载请求内容
MAX_TOKEN_SINGLE_FILE = 2560 MAX_TOKEN_SINGLE_FILE = 2560
@@ -41,7 +37,7 @@ def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
inputs_array.append(i_say) inputs_array.append(i_say)
inputs_show_user_array.append(i_say_show_user) inputs_show_user_array.append(i_say_show_user)
history_array.append([]) history_array.append([])
sys_prompt_array.append(f"You are a software architecture analyst analyzing a source code project. Do not dig into details, tell me what the code is doing in general. Your answer must be short, simple and clear{lang}.") sys_prompt_array.append("You are a software architecture analyst analyzing a source code project. Do not dig into details, tell me what the code is doing in general. Your answer must be short, simple and clear.")
# 文件读取完成,对每一个源代码文件,生成一个请求线程,发送到大模型进行分析 # 文件读取完成,对每一个源代码文件,生成一个请求线程,发送到大模型进行分析
gpt_response_collection = yield from request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency( gpt_response_collection = yield from request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
inputs_array = inputs_array, inputs_array = inputs_array,
@@ -54,20 +50,10 @@ def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
) )
# <第二步,逐个文件分析,生成带注释文件> # <第二步,逐个文件分析,生成带注释文件>
tasks = ["" for _ in range(len(file_manifest))]
def bark_fn(tasks):
for i in range(len(tasks)): tasks[i] = "watchdog is dead"
wd = WatchDog(timeout=10, bark_fn=lambda: bark_fn(tasks), interval=3, msg="ThreadWatcher timeout")
wd.begin_watch()
from concurrent.futures import ThreadPoolExecutor from concurrent.futures import ThreadPoolExecutor
executor = ThreadPoolExecutor(max_workers=get_conf('DEFAULT_WORKER_NUM')) executor = ThreadPoolExecutor(max_workers=get_conf('DEFAULT_WORKER_NUM'))
def _task_multi_threading(i_say, gpt_say, fp, file_tree_struct, index): def _task_multi_threading(i_say, gpt_say, fp, file_tree_struct):
language = 'Chinese' if plugin_kwargs["use_chinese"] else 'English' pcc = PythonCodeComment(llm_kwargs, language='English')
def observe_window_update(x):
if tasks[index] == "watchdog is dead":
raise TimeoutError("ThreadWatcher: watchdog is dead")
tasks[index] = x
pcc = PythonCodeComment(llm_kwargs, plugin_kwargs, language=language, observe_window_update=observe_window_update)
pcc.read_file(path=fp, brief=gpt_say) pcc.read_file(path=fp, brief=gpt_say)
revised_path, revised_content = pcc.begin_comment_source_code(None, None) revised_path, revised_content = pcc.begin_comment_source_code(None, None)
file_tree_struct.manifest[fp].revised_path = revised_path file_tree_struct.manifest[fp].revised_path = revised_path
@@ -79,8 +65,7 @@ def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
with open("crazy_functions/agent_fns/python_comment_compare.html", 'r', encoding='utf-8') as f: with open("crazy_functions/agent_fns/python_comment_compare.html", 'r', encoding='utf-8') as f:
html_template = f.read() html_template = f.read()
warp = lambda x: "```python\n\n" + x + "\n\n```" warp = lambda x: "```python\n\n" + x + "\n\n```"
from themes.theme import load_dynamic_theme from themes.theme import advanced_css
_, advanced_css, _, _ = load_dynamic_theme("Default")
html_template = html_template.replace("ADVANCED_CSS", advanced_css) html_template = html_template.replace("ADVANCED_CSS", advanced_css)
html_template = html_template.replace("REPLACE_CODE_FILE_LEFT", pcc.get_markdown_block_in_html(markdown_convertion_for_file(warp(pcc.original_content)))) html_template = html_template.replace("REPLACE_CODE_FILE_LEFT", pcc.get_markdown_block_in_html(markdown_convertion_for_file(warp(pcc.original_content))))
html_template = html_template.replace("REPLACE_CODE_FILE_RIGHT", pcc.get_markdown_block_in_html(markdown_convertion_for_file(warp(revised_content)))) html_template = html_template.replace("REPLACE_CODE_FILE_RIGHT", pcc.get_markdown_block_in_html(markdown_convertion_for_file(warp(revised_content))))
@@ -88,21 +73,17 @@ def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
file_tree_struct.manifest[fp].compare_html = compare_html_path file_tree_struct.manifest[fp].compare_html = compare_html_path
with open(compare_html_path, 'w', encoding='utf-8') as f: with open(compare_html_path, 'w', encoding='utf-8') as f:
f.write(html_template) f.write(html_template)
tasks[index] = "" # print('done 1')
chatbot.append([None, f"正在处理:"]) chatbot.append([None, f"正在处理:"])
futures = [] futures = []
index = 0
for i_say, gpt_say, fp in zip(gpt_response_collection[0::2], gpt_response_collection[1::2], file_manifest): for i_say, gpt_say, fp in zip(gpt_response_collection[0::2], gpt_response_collection[1::2], file_manifest):
future = executor.submit(_task_multi_threading, i_say, gpt_say, fp, file_tree_struct, index) future = executor.submit(_task_multi_threading, i_say, gpt_say, fp, file_tree_struct)
index += 1
futures.append(future) futures.append(future)
# <第三步,等待任务完成>
cnt = 0 cnt = 0
while True: while True:
cnt += 1 cnt += 1
wd.feed()
time.sleep(3) time.sleep(3)
worker_done = [h.done() for h in futures] worker_done = [h.done() for h in futures]
remain = len(worker_done) - sum(worker_done) remain = len(worker_done) - sum(worker_done)
@@ -111,18 +92,14 @@ def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
preview_html_list = [] preview_html_list = []
for done, fp in zip(worker_done, file_manifest): for done, fp in zip(worker_done, file_manifest):
if not done: continue if not done: continue
if hasattr(file_tree_struct.manifest[fp], 'compare_html'): preview_html_list.append(file_tree_struct.manifest[fp].compare_html)
preview_html_list.append(file_tree_struct.manifest[fp].compare_html)
else:
logger.error(f"文件: {fp} 的注释结果未能成功")
file_links = generate_file_link(preview_html_list) file_links = generate_file_link(preview_html_list)
yield from update_ui_lastest_msg( yield from update_ui_lastest_msg(
f"当前任务: <br/>{'<br/>'.join(tasks)}.<br/>" + f"剩余源文件数量: {remain}.\n\n" +
f"剩余源文件数量: {remain}.<br/>" + f"已完成的文件: {sum(worker_done)}.\n\n" +
f"已完成的文件: {sum(worker_done)}.<br/>" +
file_links + file_links +
"<br/>" + "\n\n" +
''.join(['.']*(cnt % 10 + 1) ''.join(['.']*(cnt % 10 + 1)
), chatbot=chatbot, history=history, delay=0) ), chatbot=chatbot, history=history, delay=0)
yield from update_ui(chatbot=chatbot, history=[]) # 刷新界面 yield from update_ui(chatbot=chatbot, history=[]) # 刷新界面
@@ -143,7 +120,6 @@ def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
@CatchException @CatchException
def 注释Python项目(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request): def 注释Python项目(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
history = [] # 清空历史,以免输入溢出 history = [] # 清空历史,以免输入溢出
plugin_kwargs["use_chinese"] = plugin_kwargs.get("use_chinese", False)
import glob, os import glob, os
if os.path.exists(txt): if os.path.exists(txt):
project_folder = txt project_folder = txt

查看文件

@@ -1,36 +0,0 @@
from toolbox import get_conf, update_ui
from crazy_functions.plugin_template.plugin_class_template import GptAcademicPluginTemplate, ArgProperty
from crazy_functions.SourceCode_Comment import 注释Python项目
class SourceCodeComment_Wrap(GptAcademicPluginTemplate):
def __init__(self):
"""
请注意`execute`会执行在不同的线程中,因此您在定义和使用类变量时,应当慎之又慎!
"""
pass
def define_arg_selection_menu(self):
"""
定义插件的二级选项菜单
"""
gui_definition = {
"main_input":
ArgProperty(title="路径", description="程序路径(上传文件后自动填写)", default_value="", type="string").model_dump_json(), # 主输入,自动从输入框同步
"use_chinese":
ArgProperty(title="注释语言", options=["英文", "中文"], default_value="英文", description="", type="dropdown").model_dump_json(),
# "use_emoji":
# ArgProperty(title="在注释中使用emoji", options=["禁止", "允许"], default_value="禁止", description="无", type="dropdown").model_dump_json(),
}
return gui_definition
def execute(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
"""
执行插件
"""
if plugin_kwargs["use_chinese"] == "中文":
plugin_kwargs["use_chinese"] = True
else:
plugin_kwargs["use_chinese"] = False
yield from 注释Python项目(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request)

查看文件

@@ -68,7 +68,6 @@ Be aware:
1. You must NOT modify the indent of code. 1. You must NOT modify the indent of code.
2. You are NOT authorized to change or translate non-comment code, and you are NOT authorized to add empty lines either, toggle qu. 2. You are NOT authorized to change or translate non-comment code, and you are NOT authorized to add empty lines either, toggle qu.
3. Use {LANG} to add comments and docstrings. Do NOT translate Chinese that is already in the code. 3. Use {LANG} to add comments and docstrings. Do NOT translate Chinese that is already in the code.
4. Besides adding a docstring, use the ⭐ symbol to annotate the most core and important line of code within the function, explaining its role.
------------------ Example ------------------ ------------------ Example ------------------
INPUT: INPUT:
@@ -117,66 +116,10 @@ def zip_result(folder):
''' '''
revise_funtion_prompt_chinese = '''
您需要阅读以下代码,并根据以下说明修订源代码({FILE_BASENAME}):
1. 如果源代码中包含函数的话, 你应该分析给定函数实现了什么功能
2. 如果源代码中包含函数的话, 你需要为函数添加docstring, docstring必须使用中文
请注意:
1. 你不得修改代码的缩进
2. 你无权更改或翻译代码中的非注释部分,也不允许添加空行
3. 使用 {LANG} 添加注释和文档字符串。不要翻译代码中已有的中文
4. 除了添加docstring之外, 使用⭐符号给该函数中最核心、最重要的一行代码添加注释,并说明其作用
------------------ 示例 ------------------
INPUT:
```
L0000 |
L0001 |def zip_result(folder):
L0002 | t = gen_time_str()
L0003 | zip_folder(folder, get_log_folder(), f"result.zip")
L0004 | return os.path.join(get_log_folder(), f"result.zip")
L0005 |
L0006 |
```
OUTPUT:
<instruction_1_purpose>
该函数用于压缩指定文件夹,并返回生成的`zip`文件的路径。
</instruction_1_purpose>
<instruction_2_revised_code>
```
def zip_result(folder):
"""
该函数将指定的文件夹压缩成ZIP文件, 并将其存储在日志文件夹中。
输入参数:
folder (str): 需要压缩的文件夹的路径。
返回值:
str: 日志文件夹中创建的ZIP文件的路径。
"""
t = gen_time_str()
zip_folder(folder, get_log_folder(), f"result.zip") # ⭐ 执行文件夹的压缩
return os.path.join(get_log_folder(), f"result.zip")
```
</instruction_2_revised_code>
------------------ End of Example ------------------
------------------ the real INPUT you need to process NOW ({FILE_BASENAME}) ------------------
```
{THE_CODE}
```
{INDENT_REMINDER}
{BRIEF_REMINDER}
{HINT_REMINDER}
'''
class PythonCodeComment(): class PythonCodeComment():
def __init__(self, llm_kwargs, plugin_kwargs, language, observe_window_update) -> None: def __init__(self, llm_kwargs, language) -> None:
self.original_content = "" self.original_content = ""
self.full_context = [] self.full_context = []
self.full_context_with_line_no = [] self.full_context_with_line_no = []
@@ -184,13 +127,7 @@ class PythonCodeComment():
self.page_limit = 100 # 100 lines of code each page self.page_limit = 100 # 100 lines of code each page
self.ignore_limit = 20 self.ignore_limit = 20
self.llm_kwargs = llm_kwargs self.llm_kwargs = llm_kwargs
self.plugin_kwargs = plugin_kwargs
self.language = language self.language = language
self.observe_window_update = observe_window_update
if self.language == "chinese":
self.core_prompt = revise_funtion_prompt_chinese
else:
self.core_prompt = revise_funtion_prompt
self.path = None self.path = None
self.file_basename = None self.file_basename = None
self.file_brief = "" self.file_brief = ""
@@ -321,7 +258,7 @@ class PythonCodeComment():
hint_reminder = "" if hint is None else f"(Reminder: do not ignore or modify code such as `{hint}`, provide complete code in the OUTPUT.)" hint_reminder = "" if hint is None else f"(Reminder: do not ignore or modify code such as `{hint}`, provide complete code in the OUTPUT.)"
self.llm_kwargs['temperature'] = 0 self.llm_kwargs['temperature'] = 0
result = predict_no_ui_long_connection( result = predict_no_ui_long_connection(
inputs=self.core_prompt.format( inputs=revise_funtion_prompt.format(
LANG=self.language, LANG=self.language,
FILE_BASENAME=self.file_basename, FILE_BASENAME=self.file_basename,
THE_CODE=code, THE_CODE=code,
@@ -411,7 +348,6 @@ class PythonCodeComment():
try: try:
# yield from update_ui_lastest_msg(f"({self.file_basename}) 正在读取下一段代码片段:\n", chatbot=chatbot, history=history, delay=0) # yield from update_ui_lastest_msg(f"({self.file_basename}) 正在读取下一段代码片段:\n", chatbot=chatbot, history=history, delay=0)
next_batch, line_no_start, line_no_end = self.get_next_batch() next_batch, line_no_start, line_no_end = self.get_next_batch()
self.observe_window_update(f"正在处理{self.file_basename} - {line_no_start}/{len(self.full_context)}\n")
# yield from update_ui_lastest_msg(f"({self.file_basename}) 处理代码片段:\n\n{next_batch}", chatbot=chatbot, history=history, delay=0) # yield from update_ui_lastest_msg(f"({self.file_basename}) 处理代码片段:\n\n{next_batch}", chatbot=chatbot, history=history, delay=0)
hint = None hint = None

查看文件

@@ -1,47 +1,39 @@
import token import ast
import tokenize
import copy class CommentRemover(ast.NodeTransformer):
import io def visit_FunctionDef(self, node):
# 移除函数的文档字符串
if (node.body and isinstance(node.body[0], ast.Expr) and
isinstance(node.body[0].value, ast.Str)):
node.body = node.body[1:]
self.generic_visit(node)
return node
def visit_ClassDef(self, node):
# 移除类的文档字符串
if (node.body and isinstance(node.body[0], ast.Expr) and
isinstance(node.body[0].value, ast.Str)):
node.body = node.body[1:]
self.generic_visit(node)
return node
def visit_Module(self, node):
# 移除模块的文档字符串
if (node.body and isinstance(node.body[0], ast.Expr) and
isinstance(node.body[0].value, ast.Str)):
node.body = node.body[1:]
self.generic_visit(node)
return node
def remove_python_comments(input_source: str) -> str: def remove_python_comments(source_code):
source_flag = copy.copy(input_source) # 解析源代码为 AST
source = io.StringIO(input_source) tree = ast.parse(source_code)
ls = input_source.split('\n') # 移除注释
prev_toktype = token.INDENT transformer = CommentRemover()
readline = source.readline tree = transformer.visit(tree)
# 将处理后的 AST 转换回源代码
def get_char_index(lineno, col): return ast.unparse(tree)
# find the index of the char in the source code
if lineno == 1:
return len('\n'.join(ls[:(lineno-1)])) + col
else:
return len('\n'.join(ls[:(lineno-1)])) + col + 1
def replace_char_between(start_lineno, start_col, end_lineno, end_col, source, replace_char, ls):
# replace char between start_lineno, start_col and end_lineno, end_col with replace_char, but keep '\n' and ' '
b = get_char_index(start_lineno, start_col)
e = get_char_index(end_lineno, end_col)
for i in range(b, e):
if source[i] == '\n':
source = source[:i] + '\n' + source[i+1:]
elif source[i] == ' ':
source = source[:i] + ' ' + source[i+1:]
else:
source = source[:i] + replace_char + source[i+1:]
return source
tokgen = tokenize.generate_tokens(readline)
for toktype, ttext, (slineno, scol), (elineno, ecol), ltext in tokgen:
if toktype == token.STRING and (prev_toktype == token.INDENT):
source_flag = replace_char_between(slineno, scol, elineno, ecol, source_flag, ' ', ls)
elif toktype == token.STRING and (prev_toktype == token.NEWLINE):
source_flag = replace_char_between(slineno, scol, elineno, ecol, source_flag, ' ', ls)
elif toktype == tokenize.COMMENT:
source_flag = replace_char_between(slineno, scol, elineno, ecol, source_flag, ' ', ls)
prev_toktype = toktype
return source_flag
# 示例使用 # 示例使用
if __name__ == "__main__": if __name__ == "__main__":

查看文件

@@ -3,7 +3,7 @@ import re
import shutil import shutil
import numpy as np import numpy as np
from loguru import logger from loguru import logger
from toolbox import update_ui, update_ui_lastest_msg, get_log_folder, gen_time_str from toolbox import update_ui, update_ui_lastest_msg, get_log_folder
from toolbox import get_conf, promote_file_to_downloadzone from toolbox import get_conf, promote_file_to_downloadzone
from crazy_functions.latex_fns.latex_toolbox import PRESERVE, TRANSFORM from crazy_functions.latex_fns.latex_toolbox import PRESERVE, TRANSFORM
from crazy_functions.latex_fns.latex_toolbox import set_forbidden_text, set_forbidden_text_begin_end, set_forbidden_text_careful_brace from crazy_functions.latex_fns.latex_toolbox import set_forbidden_text, set_forbidden_text_begin_end, set_forbidden_text_careful_brace
@@ -468,70 +468,3 @@ def write_html(sp_file_contents, sp_file_result, chatbot, project_folder):
except: except:
from toolbox import trimmed_format_exc from toolbox import trimmed_format_exc
logger.error('writing html result failed:', trimmed_format_exc()) logger.error('writing html result failed:', trimmed_format_exc())
def upload_to_gptac_cloud_if_user_allow(chatbot, arxiv_id):
try:
# 如果用户允许,我们将arxiv论文PDF上传到GPTAC学术云
from toolbox import map_file_to_sha256
# 检查是否顺利,如果没有生成预期的文件,则跳过
is_result_good = False
for file_path in chatbot._cookies.get("files_to_promote", []):
if file_path.endswith('translate_zh.pdf'):
is_result_good = True
if not is_result_good:
return
# 上传文件
for file_path in chatbot._cookies.get("files_to_promote", []):
align_name = None
# normalized name
for name in ['translate_zh.pdf', 'comparison.pdf']:
if file_path.endswith(name): align_name = name
# if match any align name
if align_name:
logger.info(f'Uploading to GPTAC cloud as the user has set `allow_cloud_io`: {file_path}')
with open(file_path, 'rb') as f:
import requests
url = 'https://cloud-2.agent-matrix.com/arxiv_tf_paper_normal_upload'
files = {'file': (align_name, f, 'application/octet-stream')}
data = {
'arxiv_id': arxiv_id,
'file_hash': map_file_to_sha256(file_path),
'language': 'zh',
'trans_prompt': 'to_be_implemented',
'llm_model': 'to_be_implemented',
'llm_model_param': 'to_be_implemented',
}
resp = requests.post(url=url, files=files, data=data, timeout=30)
logger.info(f'Uploading terminate ({resp.status_code})`: {file_path}')
except:
# 如果上传失败,不会中断程序,因为这是次要功能
pass
def check_gptac_cloud(arxiv_id, chatbot):
import requests
success = False
downloaded = []
try:
for pdf_target in ['translate_zh.pdf', 'comparison.pdf']:
url = 'https://cloud-2.agent-matrix.com/arxiv_tf_paper_normal_exist'
data = {
'arxiv_id': arxiv_id,
'name': pdf_target,
}
resp = requests.post(url=url, data=data)
cache_hit_result = resp.text.strip('"')
if cache_hit_result.startswith("http"):
url = cache_hit_result
logger.info(f'Downloading from GPTAC cloud: {url}')
resp = requests.get(url=url, timeout=30)
target = os.path.join(get_log_folder(plugin_name='gptac_cloud'), gen_time_str(), pdf_target)
os.makedirs(os.path.dirname(target), exist_ok=True)
with open(target, 'wb') as f:
f.write(resp.content)
new_path = promote_file_to_downloadzone(target, chatbot=chatbot)
success = True
downloaded.append(new_path)
except:
pass
return success, downloaded

查看文件

@@ -6,16 +6,12 @@ class SafeUnpickler(pickle.Unpickler):
def get_safe_classes(self): def get_safe_classes(self):
from crazy_functions.latex_fns.latex_actions import LatexPaperFileGroup, LatexPaperSplit from crazy_functions.latex_fns.latex_actions import LatexPaperFileGroup, LatexPaperSplit
from crazy_functions.latex_fns.latex_toolbox import LinkedListNode from crazy_functions.latex_fns.latex_toolbox import LinkedListNode
from numpy.core.multiarray import scalar
from numpy import dtype
# 定义允许的安全类 # 定义允许的安全类
safe_classes = { safe_classes = {
# 在这里添加其他安全的类 # 在这里添加其他安全的类
'LatexPaperFileGroup': LatexPaperFileGroup, 'LatexPaperFileGroup': LatexPaperFileGroup,
'LatexPaperSplit': LatexPaperSplit, 'LatexPaperSplit': LatexPaperSplit,
'LinkedListNode': LinkedListNode, 'LinkedListNode': LinkedListNode,
'scalar': scalar,
'dtype': dtype,
} }
return safe_classes return safe_classes
@@ -26,6 +22,8 @@ class SafeUnpickler(pickle.Unpickler):
for class_name in self.safe_classes.keys(): for class_name in self.safe_classes.keys():
if (class_name in f'{module}.{name}'): if (class_name in f'{module}.{name}'):
match_class_name = class_name match_class_name = class_name
if module == 'numpy' or module.startswith('numpy.'):
return super().find_class(module, name)
if match_class_name is not None: if match_class_name is not None:
return self.safe_classes[match_class_name] return self.safe_classes[match_class_name]
# 如果尝试加载未授权的类,则抛出异常 # 如果尝试加载未授权的类,则抛出异常

查看文件

@@ -644,17 +644,8 @@ def run_in_subprocess(func):
def _merge_pdfs(pdf1_path, pdf2_path, output_path): def _merge_pdfs(pdf1_path, pdf2_path, output_path):
try:
logger.info("Merging PDFs using _merge_pdfs_ng")
_merge_pdfs_ng(pdf1_path, pdf2_path, output_path)
except:
logger.info("Merging PDFs using _merge_pdfs_legacy")
_merge_pdfs_legacy(pdf1_path, pdf2_path, output_path)
def _merge_pdfs_ng(pdf1_path, pdf2_path, output_path):
import PyPDF2 # PyPDF2这个库有严重的内存泄露问题,把它放到子进程中运行,从而方便内存的释放 import PyPDF2 # PyPDF2这个库有严重的内存泄露问题,把它放到子进程中运行,从而方便内存的释放
from PyPDF2.generic import NameObject, TextStringObject, ArrayObject, FloatObject, NumberObject from PyPDF2.generic import NameObject, TextStringObject,ArrayObject,FloatObject,NumberObject
Percent = 1 Percent = 1
# raise RuntimeError('PyPDF2 has a serious memory leak problem, please use other tools to merge PDF files.') # raise RuntimeError('PyPDF2 has a serious memory leak problem, please use other tools to merge PDF files.')
@@ -697,206 +688,65 @@ def _merge_pdfs_ng(pdf1_path, pdf2_path, output_path):
), ),
0, 0,
) )
if "/Annots" in new_page: if '/Annots' in page1:
annotations = new_page["/Annots"] page1_annot_id = [annot.idnum for annot in page1['/Annots']]
else:
page1_annot_id = []
if '/Annots' in page2:
page2_annot_id = [annot.idnum for annot in page2['/Annots']]
else:
page2_annot_id = []
if '/Annots' in new_page:
annotations = new_page['/Annots']
for i, annot in enumerate(annotations): for i, annot in enumerate(annotations):
annot_obj = annot.get_object() annot_obj = annot.get_object()
# 检查注释类型是否是链接(/Link # 检查注释类型是否是链接(/Link
if annot_obj.get("/Subtype") == "/Link": if annot_obj.get('/Subtype') == '/Link':
# 检查是否为内部链接跳转(/GoTo或外部URI链接/URI # 检查是否为内部链接跳转(/GoTo或外部URI链接/URI
action = annot_obj.get("/A") action = annot_obj.get('/A')
if action: if action:
if "/S" in action and action["/S"] == "/GoTo": if '/S' in action and action['/S'] == '/GoTo':
# 内部链接:跳转到文档中的某个页面 # 内部链接:跳转到文档中的某个页面
dest = action.get("/D") # 目标页或目标位置 dest = action.get('/D') # 目标页或目标位置
# if dest and annot.idnum in page2_annot_id: if dest and annot.idnum in page2_annot_id:
# if dest in pdf2_reader.named_destinations: # 获取原始文件中跳转信息,包括跳转页面
if dest and page2.annotations: destination = pdf2_reader.named_destinations[dest]
if annot in page2.annotations: page_number = pdf2_reader.get_destination_page_number(destination)
# 获取原始文件中跳转信息,包括跳转页面 #更新跳转信息,跳转到对应的页面和,指定坐标 (100, 150),缩放比例为 100%
destination = pdf2_reader.named_destinations[ #“/D”:[10,'/XYZ',100,100,0]
dest annot_obj['/A'].update({
] NameObject("/D"): ArrayObject([NumberObject(page_number),destination.dest_array[1], FloatObject(destination.dest_array[2] + int(page1.mediaBox.getWidth())) ,destination.dest_array[3],destination.dest_array[4]]) # 确保键和值是 PdfObject
page_number = ( })
pdf2_reader.get_destination_page_number( rect = annot_obj.get('/Rect')
destination # 更新点击坐标
) rect = ArrayObject([FloatObject(rect[0]+ int(page1.mediaBox.getWidth())),rect[1],
) FloatObject(rect[2]+int(page1.mediaBox.getWidth())),rect[3] ])
# 更新跳转信息,跳转到对应的页面和,指定坐标 (100, 150),缩放比例为 100% annot_obj.update({
# “/D”:[10,'/XYZ',100,100,0] NameObject("/Rect"): rect # 确保键和值是 PdfObject
if destination.dest_array[1] == "/XYZ": })
annot_obj["/A"].update( if dest and annot.idnum in page1_annot_id:
{ # 获取原始文件中跳转信息,包括跳转页面
NameObject("/D"): ArrayObject( destination = pdf1_reader.named_destinations[dest]
[ page_number = pdf1_reader.get_destination_page_number(destination)
NumberObject(page_number), #更新跳转信息,跳转到对应的页面和,指定坐标 (100, 150),缩放比例为 100%
destination.dest_array[1], #“/D”:[10,'/XYZ',100,100,0]
FloatObject( annot_obj['/A'].update({
destination.dest_array[ NameObject("/D"): ArrayObject([NumberObject(page_number),destination.dest_array[1], FloatObject(destination.dest_array[2]) ,destination.dest_array[3],destination.dest_array[4]]) # 确保键和值是 PdfObject
2 })
] rect = annot_obj.get('/Rect')
+ int( rect = ArrayObject([FloatObject(rect[0]),rect[1],
page1.mediaBox.getWidth() FloatObject(rect[2]),rect[3] ])
) annot_obj.update({
), NameObject("/Rect"): rect # 确保键和值是 PdfObject
destination.dest_array[3], })
destination.dest_array[4],
]
) # 确保键和值是 PdfObject
}
)
else:
annot_obj["/A"].update(
{
NameObject("/D"): ArrayObject(
[
NumberObject(page_number),
destination.dest_array[1],
]
) # 确保键和值是 PdfObject
}
)
rect = annot_obj.get("/Rect") elif '/S' in action and action['/S'] == '/URI':
# 更新点击坐标
rect = ArrayObject(
[
FloatObject(
rect[0]
+ int(page1.mediaBox.getWidth())
),
rect[1],
FloatObject(
rect[2]
+ int(page1.mediaBox.getWidth())
),
rect[3],
]
)
annot_obj.update(
{
NameObject(
"/Rect"
): rect # 确保键和值是 PdfObject
}
)
# if dest and annot.idnum in page1_annot_id:
# if dest in pdf1_reader.named_destinations:
if dest and page1.annotations:
if annot in page1.annotations:
# 获取原始文件中跳转信息,包括跳转页面
destination = pdf1_reader.named_destinations[
dest
]
page_number = (
pdf1_reader.get_destination_page_number(
destination
)
)
# 更新跳转信息,跳转到对应的页面和,指定坐标 (100, 150),缩放比例为 100%
# “/D”:[10,'/XYZ',100,100,0]
if destination.dest_array[1] == "/XYZ":
annot_obj["/A"].update(
{
NameObject("/D"): ArrayObject(
[
NumberObject(page_number),
destination.dest_array[1],
FloatObject(
destination.dest_array[
2
]
),
destination.dest_array[3],
destination.dest_array[4],
]
) # 确保键和值是 PdfObject
}
)
else:
annot_obj["/A"].update(
{
NameObject("/D"): ArrayObject(
[
NumberObject(page_number),
destination.dest_array[1],
]
) # 确保键和值是 PdfObject
}
)
rect = annot_obj.get("/Rect")
rect = ArrayObject(
[
FloatObject(rect[0]),
rect[1],
FloatObject(rect[2]),
rect[3],
]
)
annot_obj.update(
{
NameObject(
"/Rect"
): rect # 确保键和值是 PdfObject
}
)
elif "/S" in action and action["/S"] == "/URI":
# 外部链接跳转到某个URI # 外部链接跳转到某个URI
uri = action.get("/URI") uri = action.get('/URI')
output_writer.addPage(new_page) output_writer.addPage(new_page)
# Save the merged PDF file
with open(output_path, "wb") as output_file:
output_writer.write(output_file)
def _merge_pdfs_legacy(pdf1_path, pdf2_path, output_path):
import PyPDF2 # PyPDF2这个库有严重的内存泄露问题,把它放到子进程中运行,从而方便内存的释放
Percent = 0.95
# raise RuntimeError('PyPDF2 has a serious memory leak problem, please use other tools to merge PDF files.')
# Open the first PDF file
with open(pdf1_path, "rb") as pdf1_file:
pdf1_reader = PyPDF2.PdfFileReader(pdf1_file)
# Open the second PDF file
with open(pdf2_path, "rb") as pdf2_file:
pdf2_reader = PyPDF2.PdfFileReader(pdf2_file)
# Create a new PDF file to store the merged pages
output_writer = PyPDF2.PdfFileWriter()
# Determine the number of pages in each PDF file
num_pages = max(pdf1_reader.numPages, pdf2_reader.numPages)
# Merge the pages from the two PDF files
for page_num in range(num_pages):
# Add the page from the first PDF file
if page_num < pdf1_reader.numPages:
page1 = pdf1_reader.getPage(page_num)
else:
page1 = PyPDF2.PageObject.createBlankPage(pdf1_reader)
# Add the page from the second PDF file
if page_num < pdf2_reader.numPages:
page2 = pdf2_reader.getPage(page_num)
else:
page2 = PyPDF2.PageObject.createBlankPage(pdf1_reader)
# Create a new empty page with double width
new_page = PyPDF2.PageObject.createBlankPage(
width=int(
int(page1.mediaBox.getWidth())
+ int(page2.mediaBox.getWidth()) * Percent
),
height=max(page1.mediaBox.getHeight(), page2.mediaBox.getHeight()),
)
new_page.mergeTranslatedPage(page1, 0, 0)
new_page.mergeTranslatedPage(
page2,
int(
int(page1.mediaBox.getWidth())
- int(page2.mediaBox.getWidth()) * (1 - Percent)
),
0,
)
output_writer.addPage(new_page) output_writer.addPage(new_page)
# Save the merged PDF file # Save the merged PDF file
with open(output_path, "wb") as output_file: with open(output_path, "wb") as output_file:

查看文件

@@ -4,9 +4,7 @@ from toolbox import promote_file_to_downloadzone, extract_archive
from toolbox import generate_file_link, zip_folder from toolbox import generate_file_link, zip_folder
from crazy_functions.crazy_utils import get_files_from_everything from crazy_functions.crazy_utils import get_files_from_everything
from shared_utils.colorful import * from shared_utils.colorful import *
from loguru import logger
import os import os
import time
def refresh_key(doc2x_api_key): def refresh_key(doc2x_api_key):
import requests, json import requests, json
@@ -24,140 +22,105 @@ def refresh_key(doc2x_api_key):
raise RuntimeError(format("[ERROR] status code: %d, body: %s" % (res.status_code, res.text))) raise RuntimeError(format("[ERROR] status code: %d, body: %s" % (res.status_code, res.text)))
return doc2x_api_key return doc2x_api_key
def 解析PDF_DOC2X_转Latex(pdf_file_path): def 解析PDF_DOC2X_转Latex(pdf_file_path):
zip_file_path, unzipped_folder = 解析PDF_DOC2X(pdf_file_path, format='tex')
return unzipped_folder
def 解析PDF_DOC2X(pdf_file_path, format='tex'):
"""
format: 'tex', 'md', 'docx'
"""
import requests, json, os import requests, json, os
DOC2X_API_KEY = get_conf('DOC2X_API_KEY') DOC2X_API_KEY = get_conf('DOC2X_API_KEY')
latex_dir = get_log_folder(plugin_name="pdf_ocr_latex") latex_dir = get_log_folder(plugin_name="pdf_ocr_latex")
markdown_dir = get_log_folder(plugin_name="pdf_ocr")
doc2x_api_key = DOC2X_API_KEY doc2x_api_key = DOC2X_API_KEY
if doc2x_api_key.startswith('sk-'):
url = "https://api.doc2x.noedgeai.com/api/v1/pdf"
# < ------ 第1步上传 ------ >
logger.info("Doc2x 第1步上传")
with open(pdf_file_path, 'rb') as file:
res = requests.post(
"https://v2.doc2x.noedgeai.com/api/v2/parse/pdf",
headers={"Authorization": "Bearer " + doc2x_api_key},
data=file
)
# res_json = []
if res.status_code == 200:
res_json = res.json()
else: else:
raise RuntimeError(f"Doc2x return an error: {res.json()}") doc2x_api_key = refresh_key(doc2x_api_key)
uuid = res_json['data']['uid'] url = "https://api.doc2x.noedgeai.com/api/platform/pdf"
# < ------ 第2步轮询等待 ------ >
logger.info("Doc2x 第2步轮询等待")
params = {'uid': uuid}
while True:
res = requests.get(
'https://v2.doc2x.noedgeai.com/api/v2/parse/status',
headers={"Authorization": "Bearer " + doc2x_api_key},
params=params
)
res_json = res.json()
if res_json['data']['status'] == "success":
break
elif res_json['data']['status'] == "processing":
time.sleep(3)
logger.info(f"Doc2x is processing at {res_json['data']['progress']}%")
elif res_json['data']['status'] == "failed":
raise RuntimeError(f"Doc2x return an error: {res_json}")
# < ------ 第3步提交转化 ------ >
logger.info("Doc2x 第3步提交转化")
data = {
"uid": uuid,
"to": format,
"formula_mode": "dollar",
"filename": "output"
}
res = requests.post( res = requests.post(
'https://v2.doc2x.noedgeai.com/api/v2/convert/parse', url,
headers={"Authorization": "Bearer " + doc2x_api_key}, files={"file": open(pdf_file_path, "rb")},
json=data data={"ocr": "1"},
headers={"Authorization": "Bearer " + doc2x_api_key}
) )
res_json = []
if res.status_code == 200: if res.status_code == 200:
res_json = res.json() decoded = res.content.decode("utf-8")
for z_decoded in decoded.split('\n'):
if len(z_decoded) == 0: continue
assert z_decoded.startswith("data: ")
z_decoded = z_decoded[len("data: "):]
decoded_json = json.loads(z_decoded)
res_json.append(decoded_json)
else: else:
raise RuntimeError(f"Doc2x return an error: {res.json()}") raise RuntimeError(format("[ERROR] status code: %d, body: %s" % (res.status_code, res.text)))
uuid = res_json[0]['uuid']
to = "latex" # latex, md, docx
url = "https://api.doc2x.noedgeai.com/api/export"+"?request_id="+uuid+"&to="+to
# < ------ 第4步等待结果 ------ > res = requests.get(url, headers={"Authorization": "Bearer " + doc2x_api_key})
logger.info("Doc2x 第4步等待结果") latex_zip_path = os.path.join(latex_dir, gen_time_str() + '.zip')
params = {'uid': uuid} latex_unzip_path = os.path.join(latex_dir, gen_time_str())
while True: if res.status_code == 200:
res = requests.get( with open(latex_zip_path, "wb") as f: f.write(res.content)
'https://v2.doc2x.noedgeai.com/api/v2/convert/parse/result', else:
headers={"Authorization": "Bearer " + doc2x_api_key}, raise RuntimeError(format("[ERROR] status code: %d, body: %s" % (res.status_code, res.text)))
params=params
)
res_json = res.json()
if res_json['data']['status'] == "success":
break
elif res_json['data']['status'] == "processing":
time.sleep(3)
logger.info(f"Doc2x still processing")
elif res_json['data']['status'] == "failed":
raise RuntimeError(f"Doc2x return an error: {res_json}")
# < ------ 第5步最后的处理 ------ >
logger.info("Doc2x 第5步最后的处理")
if format=='tex':
target_path = latex_dir
if format=='md':
target_path = markdown_dir
os.makedirs(target_path, exist_ok=True)
max_attempt = 3
# < ------ 下载 ------ >
for attempt in range(max_attempt):
try:
result_url = res_json['data']['url']
res = requests.get(result_url)
zip_path = os.path.join(target_path, gen_time_str() + '.zip')
unzip_path = os.path.join(target_path, gen_time_str())
if res.status_code == 200:
with open(zip_path, "wb") as f: f.write(res.content)
else:
raise RuntimeError(f"Doc2x return an error: {res.json()}")
except Exception as e:
if attempt < max_attempt - 1:
logger.error(f"Failed to download latex file, retrying... {e}")
time.sleep(3)
continue
else:
raise e
# < ------ 解压 ------ >
import zipfile import zipfile
with zipfile.ZipFile(zip_path, 'r') as zip_ref: with zipfile.ZipFile(latex_zip_path, 'r') as zip_ref:
zip_ref.extractall(unzip_path) zip_ref.extractall(latex_unzip_path)
return zip_path, unzip_path
return latex_unzip_path
def 解析PDF_DOC2X_单文件(fp, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, DOC2X_API_KEY, user_request): def 解析PDF_DOC2X_单文件(fp, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, DOC2X_API_KEY, user_request):
def pdf2markdown(filepath): def pdf2markdown(filepath):
chatbot.append((None, f"Doc2x 解析中")) import requests, json, os
markdown_dir = get_log_folder(plugin_name="pdf_ocr")
doc2x_api_key = DOC2X_API_KEY
if doc2x_api_key.startswith('sk-'):
url = "https://api.doc2x.noedgeai.com/api/v1/pdf"
else:
doc2x_api_key = refresh_key(doc2x_api_key)
url = "https://api.doc2x.noedgeai.com/api/platform/pdf"
chatbot.append((None, "加载PDF文件,发送至DOC2X解析..."))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
md_zip_path, unzipped_folder = 解析PDF_DOC2X(filepath, format='md') res = requests.post(
url,
files={"file": open(filepath, "rb")},
data={"ocr": "1"},
headers={"Authorization": "Bearer " + doc2x_api_key}
)
res_json = []
if res.status_code == 200:
decoded = res.content.decode("utf-8")
for z_decoded in decoded.split('\n'):
if len(z_decoded) == 0: continue
assert z_decoded.startswith("data: ")
z_decoded = z_decoded[len("data: "):]
decoded_json = json.loads(z_decoded)
res_json.append(decoded_json)
if 'limit exceeded' in decoded_json.get('status', ''):
raise RuntimeError("Doc2x API 页数受限,请联系 Doc2x 方面,并更换新的 API 秘钥。")
else:
raise RuntimeError(format("[ERROR] status code: %d, body: %s" % (res.status_code, res.text)))
uuid = res_json[0]['uuid']
to = "md" # latex, md, docx
url = "https://api.doc2x.noedgeai.com/api/export"+"?request_id="+uuid+"&to="+to
chatbot.append((None, f"读取解析: {url} ..."))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
res = requests.get(url, headers={"Authorization": "Bearer " + doc2x_api_key})
md_zip_path = os.path.join(markdown_dir, gen_time_str() + '.zip')
if res.status_code == 200:
with open(md_zip_path, "wb") as f: f.write(res.content)
else:
raise RuntimeError(format("[ERROR] status code: %d, body: %s" % (res.status_code, res.text)))
promote_file_to_downloadzone(md_zip_path, chatbot=chatbot) promote_file_to_downloadzone(md_zip_path, chatbot=chatbot)
chatbot.append((None, f"完成解析 {md_zip_path} ...")) chatbot.append((None, f"完成解析 {md_zip_path} ..."))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 yield from update_ui(chatbot=chatbot, history=history) # 刷新界面

查看文件

@@ -180,7 +180,6 @@ version: '3'
services: services:
gpt_academic_with_latex: gpt_academic_with_latex:
image: ghcr.io/binary-husky/gpt_academic_with_latex:master # (Auto Built by Dockerfile: docs/GithubAction+NoLocal+Latex) image: ghcr.io/binary-husky/gpt_academic_with_latex:master # (Auto Built by Dockerfile: docs/GithubAction+NoLocal+Latex)
# 对于ARM64设备,请将以上镜像名称替换为 ghcr.io/binary-husky/gpt_academic_with_latex_arm:master
environment: environment:
# 请查阅 `config.py` 以查看所有的配置信息 # 请查阅 `config.py` 以查看所有的配置信息
API_KEY: ' sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ' API_KEY: ' sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx '

1
docs/Dockerfile+JittorLLM 普通文件
查看文件

@@ -0,0 +1 @@
# 此Dockerfile不再维护,请前往docs/GithubAction+JittorLLMs

查看文件

@@ -0,0 +1,57 @@
# docker build -t gpt-academic-all-capacity -f docs/GithubAction+AllCapacity --network=host --build-arg http_proxy=http://localhost:10881 --build-arg https_proxy=http://localhost:10881 .
# docker build -t gpt-academic-all-capacity -f docs/GithubAction+AllCapacityBeta --network=host .
# docker run -it --net=host gpt-academic-all-capacity bash
# 从NVIDIA源,从而支持显卡检查宿主的nvidia-smi中的cuda版本必须>=11.3
FROM fuqingxu/11.3.1-runtime-ubuntu20.04-with-texlive:latest
# edge-tts需要的依赖,某些pip包所需的依赖
RUN apt update && apt install ffmpeg build-essential -y
# use python3 as the system default python
WORKDIR /gpt
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.8
# # 非必要步骤,更换pip源 (以下三行,可以删除)
# RUN echo '[global]' > /etc/pip.conf && \
# echo 'index-url = https://mirrors.aliyun.com/pypi/simple/' >> /etc/pip.conf && \
# echo 'trusted-host = mirrors.aliyun.com' >> /etc/pip.conf
# 下载pytorch
RUN python3 -m pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu113
# 准备pip依赖
RUN python3 -m pip install openai numpy arxiv rich
RUN python3 -m pip install colorama Markdown pygments pymupdf
RUN python3 -m pip install python-docx moviepy pdfminer
RUN python3 -m pip install zh_langchain==0.2.1 pypinyin
RUN python3 -m pip install rarfile py7zr
RUN python3 -m pip install aliyun-python-sdk-core==2.13.3 pyOpenSSL webrtcvad scipy git+https://github.com/aliyun/alibabacloud-nls-python-sdk.git
# 下载分支
WORKDIR /gpt
RUN git clone --depth=1 https://github.com/binary-husky/gpt_academic.git
WORKDIR /gpt/gpt_academic
RUN git clone --depth=1 https://github.com/OpenLMLab/MOSS.git request_llms/moss
RUN python3 -m pip install -r requirements.txt
RUN python3 -m pip install -r request_llms/requirements_moss.txt
RUN python3 -m pip install -r request_llms/requirements_qwen.txt
RUN python3 -m pip install -r request_llms/requirements_chatglm.txt
RUN python3 -m pip install -r request_llms/requirements_newbing.txt
RUN python3 -m pip install nougat-ocr
# 预热Tiktoken模块
RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'
# 安装知识库插件的额外依赖
RUN apt-get update && apt-get install libgl1 -y
RUN pip3 install transformers protobuf langchain sentence-transformers faiss-cpu nltk beautifulsoup4 bitsandbytes tabulate icetk --upgrade
RUN pip3 install unstructured[all-docs] --upgrade
RUN python3 -c 'from check_proxy import warm_up_vectordb; warm_up_vectordb()'
RUN rm -rf /usr/local/lib/python3.8/dist-packages/tests
# COPY .cache /root/.cache
# COPY config_private.py config_private.py
# 启动
CMD ["python3", "-u", "main.py"]

查看文件

@@ -1,34 +1,35 @@
# 此Dockerfile适用于"无本地模型"的环境构建,如果需要使用chatglm等本地模型,请参考 docs/Dockerfile+ChatGLM # 此Dockerfile适用于无本地模型的环境构建,如果需要使用chatglm等本地模型,请参考 docs/Dockerfile+ChatGLM
# - 1 修改 `config.py` # - 1 修改 `config.py`
# - 2 构建 docker build -t gpt-academic-nolocal-latex -f docs/GithubAction+NoLocal+Latex . # - 2 构建 docker build -t gpt-academic-nolocal-latex -f docs/GithubAction+NoLocal+Latex .
# - 3 运行 docker run -v /home/fuqingxu/arxiv_cache:/root/arxiv_cache --rm -it --net=host gpt-academic-nolocal-latex # - 3 运行 docker run -v /home/fuqingxu/arxiv_cache:/root/arxiv_cache --rm -it --net=host gpt-academic-nolocal-latex
FROM menghuan1918/ubuntu_uv_ctex:latest FROM fuqingxu/python311_texlive_ctex:latest
ENV DEBIAN_FRONTEND=noninteractive ENV PATH "$PATH:/usr/local/texlive/2022/bin/x86_64-linux"
SHELL ["/bin/bash", "-c"] ENV PATH "$PATH:/usr/local/texlive/2023/bin/x86_64-linux"
ENV PATH "$PATH:/usr/local/texlive/2024/bin/x86_64-linux"
ENV PATH "$PATH:/usr/local/texlive/2025/bin/x86_64-linux"
ENV PATH "$PATH:/usr/local/texlive/2026/bin/x86_64-linux"
# 指定路径
WORKDIR /gpt WORKDIR /gpt
# 先复制依赖文件 RUN pip3 install openai numpy arxiv rich
COPY requirements.txt . RUN pip3 install colorama Markdown pygments pymupdf
RUN pip3 install python-docx pdfminer
RUN pip3 install nougat-ocr
# 装载项目文件
COPY . .
# 安装依赖 # 安装依赖
RUN pip install --break-system-packages openai numpy arxiv rich colorama Markdown pygments pymupdf python-docx pdfminer \ RUN pip3 install -r requirements.txt
&& pip install --break-system-packages -r requirements.txt \
&& if [ "$(uname -m)" = "x86_64" ]; then \
pip install --break-system-packages nougat-ocr; \
fi \
&& pip cache purge \
&& rm -rf /root/.cache/pip/*
# 创建非root用户 # edge-tts需要的依赖
RUN useradd -m gptuser && chown -R gptuser /gpt RUN apt update && apt install ffmpeg -y
USER gptuser
# 最后才复制代码文件,这样代码更新时只需重建最后几层,可以大幅减少docker pull所需的大小
COPY --chown=gptuser:gptuser . .
# 可选步骤,用于预热模块 # 可选步骤,用于预热模块
RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()' RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'
# 启动 # 启动
CMD ["python3", "-u", "main.py"] CMD ["python3", "-u", "main.py"]

查看文件

@@ -256,8 +256,6 @@ model_info = {
"max_token": 128000, "max_token": 128000,
"tokenizer": tokenizer_gpt4, "tokenizer": tokenizer_gpt4,
"token_cnt": get_token_num_gpt4, "token_cnt": get_token_num_gpt4,
"openai_disable_system_prompt": True,
"openai_disable_stream": True,
}, },
"o1-mini": { "o1-mini": {
"fn_with_ui": chatgpt_ui, "fn_with_ui": chatgpt_ui,
@@ -266,8 +264,6 @@ model_info = {
"max_token": 128000, "max_token": 128000,
"tokenizer": tokenizer_gpt4, "tokenizer": tokenizer_gpt4,
"token_cnt": get_token_num_gpt4, "token_cnt": get_token_num_gpt4,
"openai_disable_system_prompt": True,
"openai_disable_stream": True,
}, },
"gpt-4-turbo": { "gpt-4-turbo": {
@@ -385,14 +381,6 @@ model_info = {
"tokenizer": tokenizer_gpt35, "tokenizer": tokenizer_gpt35,
"token_cnt": get_token_num_gpt35, "token_cnt": get_token_num_gpt35,
}, },
"glm-4-plus":{
"fn_with_ui": zhipu_ui,
"fn_without_ui": zhipu_noui,
"endpoint": None,
"max_token": 10124 * 8,
"tokenizer": tokenizer_gpt35,
"token_cnt": get_token_num_gpt35,
},
# api_2d (此后不需要在此处添加api2d的接口了,因为下面的代码会自动添加) # api_2d (此后不需要在此处添加api2d的接口了,因为下面的代码会自动添加)
"api2d-gpt-4": { "api2d-gpt-4": {
@@ -1293,3 +1281,4 @@ def predict(inputs:str, llm_kwargs:dict, plugin_kwargs:dict, chatbot,
# 更新一下llm_kwargs的参数,否则会出现参数不匹配的问题 # 更新一下llm_kwargs的参数,否则会出现参数不匹配的问题
yield from method(inputs, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, stream, additional_fn) yield from method(inputs, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, stream, additional_fn)

查看文件

@@ -202,13 +202,10 @@ def predict_no_ui_long_connection(inputs:str, llm_kwargs:dict, history:list=[],
if (time.time()-observe_window[1]) > watch_dog_patience: if (time.time()-observe_window[1]) > watch_dog_patience:
raise RuntimeError("用户取消了程序。") raise RuntimeError("用户取消了程序。")
else: raise RuntimeError("意外Json结构"+delta) else: raise RuntimeError("意外Json结构"+delta)
if json_data and json_data['finish_reason'] == 'content_filter':
finish_reason = json_data.get('finish_reason', None) if json_data else None raise RuntimeError("由于提问含不合规内容被Azure过滤。")
if finish_reason == 'content_filter': if json_data and json_data['finish_reason'] == 'length':
raise RuntimeError("由于提问含不合规内容被过滤。")
if finish_reason == 'length':
raise ConnectionAbortedError("正常结束,但显示Token不足,导致输出不完整,请削减单次输入的文本量。") raise ConnectionAbortedError("正常结束,但显示Token不足,导致输出不完整,请削减单次输入的文本量。")
return result return result
@@ -341,7 +338,7 @@ def predict(inputs:str, llm_kwargs:dict, plugin_kwargs:dict, chatbot:ChatBotWith
# 前者是API2D的结束条件,后者是OPENAI的结束条件 # 前者是API2D的结束条件,后者是OPENAI的结束条件
if ('data: [DONE]' in chunk_decoded) or (len(chunkjson['choices'][0]["delta"]) == 0): if ('data: [DONE]' in chunk_decoded) or (len(chunkjson['choices'][0]["delta"]) == 0):
# 判定为数据流的结束,gpt_replying_buffer也写完了 # 判定为数据流的结束,gpt_replying_buffer也写完了
log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer, user_name=chatbot.get_user()) log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer)
break break
# 处理数据流的主体 # 处理数据流的主体
status_text = f"finish_reason: {chunkjson['choices'][0].get('finish_reason', 'null')}" status_text = f"finish_reason: {chunkjson['choices'][0].get('finish_reason', 'null')}"
@@ -375,7 +372,7 @@ def handle_o1_model_special(response, inputs, llm_kwargs, chatbot, history):
try: try:
chunkjson = json.loads(response.content.decode()) chunkjson = json.loads(response.content.decode())
gpt_replying_buffer = chunkjson['choices'][0]["message"]["content"] gpt_replying_buffer = chunkjson['choices'][0]["message"]["content"]
log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer, user_name=chatbot.get_user()) log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer)
history[-1] = gpt_replying_buffer history[-1] = gpt_replying_buffer
chatbot[-1] = (history[-2], history[-1]) chatbot[-1] = (history[-2], history[-1])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
@@ -539,3 +536,4 @@ def generate_payload(inputs:str, llm_kwargs:dict, history:list, system_prompt:st
return headers,payload return headers,payload

查看文件

@@ -184,7 +184,7 @@ def predict(inputs, llm_kwargs, plugin_kwargs, chatbot, history=[], system_promp
# 判定为数据流的结束,gpt_replying_buffer也写完了 # 判定为数据流的结束,gpt_replying_buffer也写完了
lastmsg = chatbot[-1][-1] + f"\n\n\n\n{llm_kwargs['llm_model']}调用结束,该模型不具备上下文对话能力,如需追问,请及时切换模型。」" lastmsg = chatbot[-1][-1] + f"\n\n\n\n{llm_kwargs['llm_model']}调用结束,该模型不具备上下文对话能力,如需追问,请及时切换模型。」"
yield from update_ui_lastest_msg(lastmsg, chatbot, history, delay=1) yield from update_ui_lastest_msg(lastmsg, chatbot, history, delay=1)
log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer, user_name=chatbot.get_user()) log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer)
break break
# 处理数据流的主体 # 处理数据流的主体
status_text = f"finish_reason: {chunkjson['choices'][0].get('finish_reason', 'null')}" status_text = f"finish_reason: {chunkjson['choices'][0].get('finish_reason', 'null')}"

查看文件

@@ -216,7 +216,7 @@ def predict(inputs, llm_kwargs, plugin_kwargs, chatbot, history=[], system_promp
if need_to_pass: if need_to_pass:
pass pass
elif is_last_chunk: elif is_last_chunk:
log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer, user_name=chatbot.get_user()) log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer)
# logger.info(f'[response] {gpt_replying_buffer}') # logger.info(f'[response] {gpt_replying_buffer}')
break break
else: else:

查看文件

@@ -223,7 +223,7 @@ def predict(inputs:str, llm_kwargs:dict, plugin_kwargs:dict, chatbot:ChatBotWith
chatbot[-1] = (history[-2], history[-1]) chatbot[-1] = (history[-2], history[-1])
yield from update_ui(chatbot=chatbot, history=history, msg="正常") # 刷新界面 yield from update_ui(chatbot=chatbot, history=history, msg="正常") # 刷新界面
if chunkjson['event_type'] == 'stream-end': if chunkjson['event_type'] == 'stream-end':
log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer, user_name=chatbot.get_user()) log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer)
history[-1] = gpt_replying_buffer history[-1] = gpt_replying_buffer
chatbot[-1] = (history[-2], history[-1]) chatbot[-1] = (history[-2], history[-1])
yield from update_ui(chatbot=chatbot, history=history, msg="正常") # 刷新界面 yield from update_ui(chatbot=chatbot, history=history, msg="正常") # 刷新界面

查看文件

@@ -109,7 +109,7 @@ def predict(inputs:str, llm_kwargs:dict, plugin_kwargs:dict, chatbot:ChatBotWith
gpt_replying_buffer += paraphrase['text'] # 使用 json 解析库进行处理 gpt_replying_buffer += paraphrase['text'] # 使用 json 解析库进行处理
chatbot[-1] = (inputs, gpt_replying_buffer) chatbot[-1] = (inputs, gpt_replying_buffer)
history[-1] = gpt_replying_buffer history[-1] = gpt_replying_buffer
log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer, user_name=chatbot.get_user()) log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer)
yield from update_ui(chatbot=chatbot, history=history) yield from update_ui(chatbot=chatbot, history=history)
if error_match: if error_match:
history = history[-2] # 错误的不纳入对话 history = history[-2] # 错误的不纳入对话

查看文件

@@ -166,7 +166,7 @@ def predict(inputs:str, llm_kwargs:dict, plugin_kwargs:dict, chatbot:ChatBotWith
history = history[:-2] history = history[:-2]
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
break break
log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_bro_result, user_name=chatbot.get_user()) log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_bro_result)
def predict_no_ui_long_connection(inputs, llm_kwargs, history=[], sys_prompt="", observe_window=None, def predict_no_ui_long_connection(inputs, llm_kwargs, history=[], sys_prompt="", observe_window=None,
console_slience=False): console_slience=False):

查看文件

@@ -337,7 +337,7 @@ def predict(inputs:str, llm_kwargs:dict, plugin_kwargs:dict, chatbot:ChatBotWith
# 前者是API2D的结束条件,后者是OPENAI的结束条件 # 前者是API2D的结束条件,后者是OPENAI的结束条件
if ('data: [DONE]' in chunk_decoded) or (len(chunkjson['choices'][0]["delta"]) == 0): if ('data: [DONE]' in chunk_decoded) or (len(chunkjson['choices'][0]["delta"]) == 0):
# 判定为数据流的结束,gpt_replying_buffer也写完了 # 判定为数据流的结束,gpt_replying_buffer也写完了
log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer, user_name=chatbot.get_user()) log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer)
break break
# 处理数据流的主体 # 处理数据流的主体
status_text = f"finish_reason: {chunkjson['choices'][0].get('finish_reason', 'null')}" status_text = f"finish_reason: {chunkjson['choices'][0].get('finish_reason', 'null')}"
@@ -371,7 +371,7 @@ def handle_o1_model_special(response, inputs, llm_kwargs, chatbot, history):
try: try:
chunkjson = json.loads(response.content.decode()) chunkjson = json.loads(response.content.decode())
gpt_replying_buffer = chunkjson['choices'][0]["message"]["content"] gpt_replying_buffer = chunkjson['choices'][0]["message"]["content"]
log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer, user_name=chatbot.get_user()) log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer)
history[-1] = gpt_replying_buffer history[-1] = gpt_replying_buffer
chatbot[-1] = (history[-2], history[-1]) chatbot[-1] = (history[-2], history[-1])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 yield from update_ui(chatbot=chatbot, history=history) # 刷新界面

查看文件

@@ -59,7 +59,7 @@ def predict(inputs, llm_kwargs, plugin_kwargs, chatbot, history=[], system_promp
chatbot[-1] = (inputs, response) chatbot[-1] = (inputs, response)
yield from update_ui(chatbot=chatbot, history=history) yield from update_ui(chatbot=chatbot, history=history)
log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=response, user_name=chatbot.get_user()) log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=response)
# 总结输出 # 总结输出
if response == f"[Local Message] 等待{model_name}响应中 ...": if response == f"[Local Message] 等待{model_name}响应中 ...":
response = f"[Local Message] {model_name}响应异常 ..." response = f"[Local Message] {model_name}响应异常 ..."

查看文件

@@ -68,5 +68,5 @@ def predict(inputs:str, llm_kwargs:dict, plugin_kwargs:dict, chatbot:ChatBotWith
chatbot[-1] = [inputs, response] chatbot[-1] = [inputs, response]
yield from update_ui(chatbot=chatbot, history=history) yield from update_ui(chatbot=chatbot, history=history)
history.extend([inputs, response]) history.extend([inputs, response])
log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=response, user_name=chatbot.get_user()) log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=response)
yield from update_ui(chatbot=chatbot, history=history) yield from update_ui(chatbot=chatbot, history=history)

查看文件

@@ -97,5 +97,5 @@ def predict(inputs:str, llm_kwargs:dict, plugin_kwargs:dict, chatbot:ChatBotWith
chatbot[-1] = [inputs, response] chatbot[-1] = [inputs, response]
yield from update_ui(chatbot=chatbot, history=history) yield from update_ui(chatbot=chatbot, history=history)
history.extend([inputs, response]) history.extend([inputs, response])
log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=response, user_name=chatbot.get_user()) log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=response)
yield from update_ui(chatbot=chatbot, history=history) yield from update_ui(chatbot=chatbot, history=history)

查看文件

@@ -2,15 +2,14 @@ https://public.agent-matrix.com/publish/gradio-3.32.10-py3-none-any.whl
fastapi==0.110 fastapi==0.110
gradio-client==0.8 gradio-client==0.8
pypdf2==2.12.1 pypdf2==2.12.1
httpx<=0.25.2
zhipuai==2.0.1 zhipuai==2.0.1
tiktoken>=0.3.3 tiktoken>=0.3.3
requests[socks] requests[socks]
pydantic==2.9.2 pydantic==2.5.2
llama-index~=0.10
protobuf==3.20 protobuf==3.20
transformers>=4.27.1,<4.42 transformers>=4.27.1,<4.42
scipdf_parser>=0.52 scipdf_parser>=0.52
spacy==3.7.4
anthropic>=0.18.1 anthropic>=0.18.1
python-markdown-math python-markdown-math
pymdown-extensions pymdown-extensions
@@ -33,14 +32,3 @@ loguru
arxiv arxiv
numpy numpy
rich rich
llama-index-core==0.10.68
llama-index-legacy==0.9.48
llama-index-readers-file==0.1.33
llama-index-readers-llama-parse==0.1.6
llama-index-embeddings-azure-openai==0.1.10
llama-index-embeddings-openai==0.1.10
llama-parse==0.4.9
mdit-py-plugins>=0.3.3
linkify-it-py==2.0.3

查看文件

@@ -138,9 +138,7 @@ def start_app(app_block, CONCURRENT_COUNT, AUTHENTICATION, PORT, SSL_KEYFILE, SS
app_block.is_sagemaker = False app_block.is_sagemaker = False
gradio_app = App.create_app(app_block) gradio_app = App.create_app(app_block)
for route in list(gradio_app.router.routes):
if route.path == "/proxy={url_path:path}":
gradio_app.router.routes.remove(route)
# --- --- replace gradio endpoint to forbid access to sensitive files --- --- # --- --- replace gradio endpoint to forbid access to sensitive files --- ---
if len(AUTHENTICATION) > 0: if len(AUTHENTICATION) > 0:
dependencies = [] dependencies = []
@@ -156,13 +154,9 @@ def start_app(app_block, CONCURRENT_COUNT, AUTHENTICATION, PORT, SSL_KEYFILE, SS
@gradio_app.head("/file={path_or_url:path}", dependencies=dependencies) @gradio_app.head("/file={path_or_url:path}", dependencies=dependencies)
@gradio_app.get("/file={path_or_url:path}", dependencies=dependencies) @gradio_app.get("/file={path_or_url:path}", dependencies=dependencies)
async def file(path_or_url: str, request: fastapi.Request): async def file(path_or_url: str, request: fastapi.Request):
if not _authorize_user(path_or_url, request, gradio_app): if len(AUTHENTICATION) > 0:
return "越权访问!" if not _authorize_user(path_or_url, request, gradio_app):
stripped = path_or_url.lstrip().lower() return "越权访问!"
if stripped.startswith("https://") or stripped.startswith("http://"):
return "账户密码授权模式下, 禁止链接!"
if '../' in stripped:
return "非法路径!"
return await endpoint(path_or_url, request) return await endpoint(path_or_url, request)
from fastapi import Request, status from fastapi import Request, status
@@ -173,26 +167,6 @@ def start_app(app_block, CONCURRENT_COUNT, AUTHENTICATION, PORT, SSL_KEYFILE, SS
response.delete_cookie('access-token') response.delete_cookie('access-token')
response.delete_cookie('access-token-unsecure') response.delete_cookie('access-token-unsecure')
return response return response
else:
dependencies = []
endpoint = None
for route in list(gradio_app.router.routes):
if route.path == "/file/{path:path}":
gradio_app.router.routes.remove(route)
if route.path == "/file={path_or_url:path}":
dependencies = route.dependencies
endpoint = route.endpoint
gradio_app.router.routes.remove(route)
@gradio_app.get("/file/{path:path}", dependencies=dependencies)
@gradio_app.head("/file={path_or_url:path}", dependencies=dependencies)
@gradio_app.get("/file={path_or_url:path}", dependencies=dependencies)
async def file(path_or_url: str, request: fastapi.Request):
stripped = path_or_url.lstrip().lower()
if stripped.startswith("https://") or stripped.startswith("http://"):
return "账户密码授权模式下, 禁止链接!"
if '../' in stripped:
return "非法路径!"
return await endpoint(path_or_url, request)
# --- --- enable TTS (text-to-speech) functionality --- --- # --- --- enable TTS (text-to-speech) functionality --- ---
TTS_TYPE = get_conf("TTS_TYPE") TTS_TYPE = get_conf("TTS_TYPE")

查看文件

@@ -104,27 +104,17 @@ def extract_archive(file_path, dest_dir):
logger.info("Successfully extracted zip archive to {}".format(dest_dir)) logger.info("Successfully extracted zip archive to {}".format(dest_dir))
elif file_extension in [".tar", ".gz", ".bz2"]: elif file_extension in [".tar", ".gz", ".bz2"]:
try: with tarfile.open(file_path, "r:*") as tarobj:
with tarfile.open(file_path, "r:*") as tarobj: # 清理提取路径,移除任何不安全的元素
# 清理提取路径,移除任何不安全的元素 for member in tarobj.getmembers():
for member in tarobj.getmembers(): member_path = os.path.normpath(member.name)
member_path = os.path.normpath(member.name) full_path = os.path.join(dest_dir, member_path)
full_path = os.path.join(dest_dir, member_path) full_path = os.path.abspath(full_path)
full_path = os.path.abspath(full_path) if not full_path.startswith(os.path.abspath(dest_dir) + os.sep):
if not full_path.startswith(os.path.abspath(dest_dir) + os.sep): raise Exception(f"Attempted Path Traversal in {member.name}")
raise Exception(f"Attempted Path Traversal in {member.name}")
tarobj.extractall(path=dest_dir) tarobj.extractall(path=dest_dir)
logger.info("Successfully extracted tar archive to {}".format(dest_dir)) logger.info("Successfully extracted tar archive to {}".format(dest_dir))
except tarfile.ReadError as e:
if file_extension == ".gz":
# 一些特别奇葩的项目,是一个gz文件,里面不是tar,只有一个tex文件
import gzip
with gzip.open(file_path, 'rb') as f_in:
with open(os.path.join(dest_dir, 'main.tex'), 'wb') as f_out:
f_out.write(f_in.read())
else:
raise e
# 第三方库,需要预先pip install rarfile # 第三方库,需要预先pip install rarfile
# 此外,Windows上还需要安装winrar软件,配置其Path环境变量,如"C:\Program Files\WinRAR"才可以 # 此外,Windows上还需要安装winrar软件,配置其Path环境变量,如"C:\Program Files\WinRAR"才可以

查看文件

@@ -14,7 +14,6 @@ openai_regex = re.compile(
r"sk-[a-zA-Z0-9_-]{92}$|" + r"sk-[a-zA-Z0-9_-]{92}$|" +
r"sk-proj-[a-zA-Z0-9_-]{48}$|"+ r"sk-proj-[a-zA-Z0-9_-]{48}$|"+
r"sk-proj-[a-zA-Z0-9_-]{124}$|"+ r"sk-proj-[a-zA-Z0-9_-]{124}$|"+
r"sk-proj-[a-zA-Z0-9_-]{156}$|"+ #新版apikey位数不匹配故修改此正则表达式
r"sess-[a-zA-Z0-9]{40}$" r"sess-[a-zA-Z0-9]{40}$"
) )
def is_openai_api_key(key): def is_openai_api_key(key):

查看文件

@@ -1,12 +0,0 @@
"""
对项目中的各个插件进行测试。运行方法:直接运行 python tests/test_plugins.py
"""
import init_test
import os, sys
if __name__ == "__main__":
from test_utils import plugin_test
plugin_test(plugin='crazy_functions.数学动画生成manim->动画生成', main_input="A point moving along function culve y=sin(x), starting from x=0 and stop at x=4*\pi.")

查看文件

@@ -1,7 +0,0 @@
import init_test
from crazy_functions.pdf_fns.parse_pdf_via_doc2x import 解析PDF_DOC2X_转Latex
# 解析PDF_DOC2X_转Latex("gpt_log/arxiv_cache_old/2410.10819/workfolder/merge.pdf")
# 解析PDF_DOC2X_转Latex("gpt_log/arxiv_cache_ooo/2410.07095/workfolder/merge.pdf")
解析PDF_DOC2X_转Latex("2410.11190v2.pdf")

查看文件

@@ -1029,7 +1029,7 @@ def check_repeat_upload(new_pdf_path, pdf_hash):
# 如果所有页的内容都相同,返回 True # 如果所有页的内容都相同,返回 True
return False, None return False, None
def log_chat(llm_model: str, input_str: str, output_str: str, user_name: str=default_user_name): def log_chat(llm_model: str, input_str: str, output_str: str):
try: try:
if output_str and input_str and llm_model: if output_str and input_str and llm_model:
uid = str(uuid.uuid4().hex) uid = str(uuid.uuid4().hex)
@@ -1038,8 +1038,8 @@ def log_chat(llm_model: str, input_str: str, output_str: str, user_name: str=def
logger.bind(chat_msg=True).info(dedent( logger.bind(chat_msg=True).info(dedent(
""" """
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ ╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
[UID/USER] [UID]
{uid}/{user_name} {uid}
[Model] [Model]
{llm_model} {llm_model}
[Query] [Query]
@@ -1047,6 +1047,6 @@ def log_chat(llm_model: str, input_str: str, output_str: str, user_name: str=def
[Response] [Response]
{output_str} {output_str}
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
""").format(uid=uid, user_name=user_name, llm_model=llm_model, input_str=input_str, output_str=output_str)) """).format(uid=uid, llm_model=llm_model, input_str=input_str, output_str=output_str))
except: except:
logger.error(trimmed_format_exc()) logger.error(trimmed_format_exc())

查看文件

@@ -1,5 +1,5 @@
{ {
"version": 3.90, "version": 3.83,
"show_feature": true, "show_feature": true,
"new_feature": "增加RAG组件 <-> 升级多合一主提交键" "new_feature": "增加欢迎页面 <-> 优化图像生成插件 <-> 添加紫东太初大模型支持 <-> 保留主题选择 <-> 支持更复杂的插件框架 <-> 上传文件时显示进度条"
} }