solve the pdf concate error

Merge branch 'frontier' of github.com:binary-husky/chatgpt_academic into frontier
remove logging extra
2025-12-06 22:46:48 +00:00 · 2024-10-13 07:36:36 +00:00 · 2024-10-01 11:59:14 +00:00 · 2024-10-01 11:57:47 +00:00 · 2024-09-28 18:05:34 +08:00 · 2024-09-23 15:16:13 +00:00
--- a/.github/workflows/build-with-all-capacity-beta.yml
+++ b/.github/workflows/build-with-all-capacity-beta.yml
@@ -1,14 +1,14 @@
 # https://docs.github.com/en/actions/publishing-packages/publishing-docker-images#publishing-images-to-github-packages
-name: build-with-latex-arm
+name: build-with-all-capacity-beta
 on:
  push:
    branches:
-      - "master"
+      - 'master'
 env:
  REGISTRY: ghcr.io
-  IMAGE_NAME: ${{ github.repository }}_with_latex_arm
+  IMAGE_NAME: ${{ github.repository }}_with_all_capacity_beta
 jobs:
  build-and-push-image:
@@ -18,17 +18,11 @@ jobs:
      packages: write
    steps:
      - name: Set up QEMU
        uses: docker/setup-qemu-action@v3
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      - name: Checkout repository
-        uses: actions/checkout@v4
+        uses: actions/checkout@v3
      - name: Log in to the Container registry
-        uses: docker/login-action@v3
+        uses: docker/login-action@v2
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
@@ -41,11 +35,10 @@ jobs:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
      - name: Build and push Docker image
-        uses: docker/build-push-action@v6
+        uses: docker/build-push-action@v4
        with:
          context: .
          push: true
-          platforms: linux/arm64
+          file: docs/GithubAction+AllCapacityBeta
          file: docs/GithubAction+NoLocal+Latex
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
--- a/.github/workflows/build-with-jittorllms.yml
+++ b/.github/workflows/build-with-jittorllms.yml
@@ -0,0 +1,44 @@
 # https://docs.github.com/en/actions/publishing-packages/publishing-docker-images#publishing-images-to-github-packages
 name: build-with-jittorllms
 on:
  push:
    branches:
      - 'master'
 env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}_jittorllms
 jobs:
  build-and-push-image:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
    steps:
      - name: Checkout repository
        uses: actions/checkout@v3
      - name: Log in to the Container registry
        uses: docker/login-action@v2
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      - name: Extract metadata (tags, labels) for Docker
        id: meta
        uses: docker/metadata-action@v4
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
      - name: Build and push Docker image
        uses: docker/build-push-action@v4
        with:
          context: .
          push: true
          file: docs/GithubAction+JittorLLMs
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
--- a/.gitignore
+++ b/.gitignore
@@ -161,5 +161,3 @@ temp.*
 objdump*
 *.min.*.js
 TODO
 experimental_mods
 search_results
--- a/README.md
+++ b/README.md
@@ -1,6 +1,5 @@
 > [!IMPORTANT]
-> 2024.10.10: 突发停电，紧急恢复了提供[whl包](https://drive.google.com/file/d/19U_hsLoMrjOlQSzYS3pzWX9fTzyusArP/view?usp=sharing)的文件服务器  
+> 2024.6.1: 版本3.80加入插件二级菜单功能（详见wiki）  
 > 2024.10.8: 版本3.90加入对llama-index的初步支持，版本3.80加入插件二级菜单功能（详见wiki）  
 > 2024.5.1: 加入Doc2x翻译PDF论文的功能，[查看详情](https://github.com/binary-husky/gpt_academic/wiki/Doc2x)  
 > 2024.3.11: 全力支持Qwen、GLM、DeepseekCoder等中文大语言模型！ SoVits语音克隆模块，[查看详情](https://www.bilibili.com/video/BV1Rp421S7tF/) 
 > 2024.1.17: 安装依赖时，请选择`requirements.txt`中**指定的版本**。 安装命令：`pip install -r requirements.txt`。本项目完全开源免费，您可通过订阅[在线服务](https://github.com/binary-husky/gpt_academic/wiki/online)的方式鼓励本项目的发展。
--- a/check_proxy.py
+++ b/check_proxy.py
@@ -1,36 +1,24 @@
 from loguru import logger
 def check_proxy(proxies, return_ip=False):
    """
    检查代理配置并返回结果。
    Args:
        proxies (dict): 包含http和https代理配置的字典。
        return_ip (bool, optional): 是否返回代理的IP地址。默认为False。
    Returns:
        str or None: 检查的结果信息或代理的IP地址（如果`return_ip`为True）。
    """
    import requests
    proxies_https = proxies['https'] if proxies is not None else '无'
    ip = None
    try:
-        response = requests.get("https://ipapi.co/json/", proxies=proxies, timeout=4)  # ⭐ 执行GET请求以获取代理信息
+        response = requests.get("https://ipapi.co/json/", proxies=proxies, timeout=4)
        data = response.json()
        if 'country_name' in data:
            country = data['country_name']
            result = f"代理配置 {proxies_https}, 代理所在地：{country}"
-            if 'ip' in data:
+            if 'ip' in data: ip = data['ip']
                ip = data['ip']
        elif 'error' in data:
-            alternative, ip = _check_with_backup_source(proxies)  # ⭐ 调用备用方法检查代理配置
+            alternative, ip = _check_with_backup_source(proxies)
            if alternative is None:
                result = f"代理配置 {proxies_https}, 代理所在地：未知，IP查询频率受限"
            else:
                result = f"代理配置 {proxies_https}, 代理所在地：{alternative}"
        else:
            result = f"代理配置 {proxies_https}, 代理数据解析失败：{data}"
        if not return_ip:
            logger.warning(result)
            return result
@@ -45,33 +33,17 @@ def check_proxy(proxies, return_ip=False):
            return ip
 def _check_with_backup_source(proxies):
    """
    通过备份源检查代理，并获取相应信息。
    Args:
        proxies (dict): 包含代理信息的字典。
    Returns:
        tuple: 代理信息(geo)和IP地址(ip)的元组。
    """
    import random, string, requests
    random_string = ''.join(random.choices(string.ascii_letters + string.digits, k=32))
    try:
-        res_json = requests.get(f"http://{random_string}.edns.ip-api.com/json", proxies=proxies, timeout=4).json()  # ⭐ 执行代理检查和备份源请求
+        res_json = requests.get(f"http://{random_string}.edns.ip-api.com/json", proxies=proxies, timeout=4).json()
        return res_json['dns']['geo'], res_json['dns']['ip']
    except:
        return None, None
 def backup_and_download(current_version, remote_version):
    """
-    一键更新协议：备份当前版本，下载远程版本并解压缩。
+    一键更新协议：备份和下载
    Args:
        current_version (str): 当前版本号。
        remote_version (str): 远程版本号。
    Returns:
        str: 新版本目录的路径。
    """
    from toolbox import get_conf
    import shutil
@@ -88,7 +60,7 @@ def backup_and_download(current_version, remote_version):
    proxies = get_conf('proxies')
    try:    r = requests.get('https://github.com/binary-husky/chatgpt_academic/archive/refs/heads/master.zip', proxies=proxies, stream=True)
    except: r = requests.get('https://public.agent-matrix.com/publish/master.zip', proxies=proxies, stream=True)
-    zip_file_path = backup_dir+'/master.zip'  # ⭐ 保存备份文件的路径
+    zip_file_path = backup_dir+'/master.zip'
    with open(zip_file_path, 'wb+') as f:
        f.write(r.content)
    dst_path = new_version_dir
@@ -104,17 +76,6 @@ def backup_and_download(current_version, remote_version):
 def patch_and_restart(path):
    """
    一键更新协议：覆盖和重启
    Args:
        path (str): 新版本代码所在的路径
    注意事项:
        如果您的程序没有使用config_private.py私密配置文件，则会将config.py重命名为config_private.py以避免配置丢失。
    更新流程:
        - 复制最新版本代码到当前目录
        - 更新pip包依赖
        - 如果更新失败，则提示手动安装依赖库并重启
    """
    from distutils import dir_util
    import shutil
@@ -123,43 +84,32 @@ def patch_and_restart(path):
    import time
    import glob
    from shared_utils.colorful import log亮黄, log亮绿, log亮红
-
+    # if not using config_private, move origin config.py as config_private.py
    if not os.path.exists('config_private.py'):
        log亮黄('由于您没有设置config_private.py私密配置，现将您的现有配置移动至config_private.py以防止配置丢失，',
              '另外您可以随时在history子文件夹下找回旧版的程序。')
        shutil.copyfile('config.py', 'config_private.py')
    path_new_version = glob.glob(path + '/*-master')[0]
-    dir_util.copy_tree(path_new_version, './')  # ⭐ 将最新版本代码复制到当前目录
+    dir_util.copy_tree(path_new_version, './')
    log亮绿('代码已经更新，即将更新pip包依赖……')
    for i in reversed(range(5)): time.sleep(1); log亮绿(i)
    try:
        import subprocess
        subprocess.check_call([sys.executable, '-m', 'pip', 'install', '-r', 'requirements.txt'])
    except:
        log亮红('pip包依赖安装出现问题，需要手动安装新增的依赖库 `python -m pip install -r requirements.txt`，然后在用常规的`python main.py`的方式启动。')
    log亮绿('更新完成，您可以随时在history子文件夹下找回旧版的程序，5s之后重启')
    log亮红('假如重启失败，您可能需要手动安装新增的依赖库 `python -m pip install -r requirements.txt`，然后在用常规的`python main.py`的方式启动。')
    log亮绿(' ------------------------------ -----------------------------------')
    for i in reversed(range(8)): time.sleep(1); log亮绿(i)
-    os.execl(sys.executable, sys.executable, *sys.argv)  # 重启程序
+    os.execl(sys.executable, sys.executable, *sys.argv)
 def get_current_version():
    """
    获取当前的版本号。
    Returns:
        str: 当前的版本号。如果无法获取版本号，则返回空字符串。
    """
    import json
    try:
        with open('./version', 'r', encoding='utf8') as f:
-            current_version = json.loads(f.read())['version']  # ⭐ 从读取的json数据中提取版本号
+            current_version = json.loads(f.read())['version']
    except:
        current_version = ""
    return current_version
@@ -168,12 +118,6 @@ def get_current_version():
 def auto_update(raise_error=False):
    """
    一键更新协议：查询版本和用户意见
    Args:
        raise_error (bool, optional): 是否在出错时抛出错误。默认为 False。
    Returns:
        None
    """
    try:
        from toolbox import get_conf
@@ -193,13 +137,13 @@ def auto_update(raise_error=False):
            current_version = json.loads(current_version)['version']
        if (remote_version - current_version) >= 0.01-1e-5:
            from shared_utils.colorful import log亮黄
-            log亮黄(f'\n新版本可用。新版本:{remote_version}，当前版本:{current_version}。{new_feature}')  # ⭐ 在控制台打印新版本信息
+            log亮黄(f'\n新版本可用。新版本:{remote_version}，当前版本:{current_version}。{new_feature}')
            logger.info('（1）Github更新地址:\nhttps://github.com/binary-husky/chatgpt_academic\n')
            user_instruction = input('（2）是否一键更新代码（Y+回车=确认，输入其他/无输入+回车=不更新）？')
            if user_instruction in ['Y', 'y']:
-                path = backup_and_download(current_version, remote_version)  # ⭐ 备份并下载文件
+                path = backup_and_download(current_version, remote_version)
                try:
-                    patch_and_restart(path)  # ⭐ 执行覆盖并重启操作
+                    patch_and_restart(path)
                except:
                    msg = '更新失败。'
                    if raise_error:
@@ -219,9 +163,6 @@ def auto_update(raise_error=False):
        logger.info(msg)
 def warm_up_modules():
    """
    预热模块，加载特定模块并执行预热操作。
    """
    logger.info('正在执行一些模块的预热 ...')
    from toolbox import ProxyNetworkActivate
    from request_llms.bridge_all import model_info
@@ -232,16 +173,6 @@ def warm_up_modules():
        enc.encode("模块预热", disallowed_special=())
 def warm_up_vectordb():
    """
    执行一些模块的预热操作。
    本函数主要用于执行一些模块的预热操作，确保在后续的流程中能够顺利运行。
    ⭐ 关键作用：预热模块
    Returns:
        None
    """
    logger.info('正在执行一些模块的预热 ...')
    from toolbox import ProxyNetworkActivate
    with ProxyNetworkActivate("Warmup_Modules"):
--- a/crazy_functional.py
+++ b/crazy_functional.py
@@ -6,6 +6,7 @@ from loguru import logger
 def get_crazy_functions():
    from crazy_functions.读文章写摘要 import 读文章写摘要
    from crazy_functions.生成函数注释 import 批量生成函数注释
    from crazy_functions.Rag_Interface import Rag问答
    from crazy_functions.SourceCode_Analyse import 解析项目本身
    from crazy_functions.SourceCode_Analyse import 解析一个Python项目
    from crazy_functions.SourceCode_Analyse import 解析一个Matlab项目
@@ -49,9 +50,15 @@ def get_crazy_functions():
    from crazy_functions.Image_Generate import 图片生成_DALLE2, 图片生成_DALLE3, 图片修改_DALLE2
    from crazy_functions.Image_Generate_Wrap import ImageGen_Wrap
    from crazy_functions.SourceCode_Comment import 注释Python项目
    from crazy_functions.SourceCode_Comment_Wrap import SourceCodeComment_Wrap
    function_plugins = {
        "Rag智能召回": {
            "Group": "对话",
            "Color": "stop",
            "AsButton": False,
            "Info": "将问答数据记录到向量库中，作为长期参考。",
            "Function": HotReload(Rag问答),
        },
        "虚空终端": {
            "Group": "对话|编程|学术|智能体",
            "Color": "stop",
@@ -72,7 +79,6 @@ def get_crazy_functions():
            "AsButton": False,
            "Info": "上传一系列python源文件(或者压缩包), 为这些代码添加docstring | 输入参数为路径",
            "Function": HotReload(注释Python项目),
            "Class": SourceCodeComment_Wrap,
        },
        "载入对话历史存档（先上传存档或输入路径）": {
            "Group": "对话",
@@ -701,31 +707,6 @@ def get_crazy_functions():
        logger.error(trimmed_format_exc())
        logger.error("Load function plugin failed")
    try:
        from crazy_functions.Rag_Interface import Rag问答
        function_plugins.update(
            {
                "Rag智能召回": {
                    "Group": "对话",
                    "Color": "stop",
                    "AsButton": False,
                    "Info": "将问答数据记录到向量库中，作为长期参考。",
                    "Function": HotReload(Rag问答),
                },
            }
        )
    except:
        logger.error(trimmed_format_exc())
        logger.error("Load function plugin failed")
    # try:
    #     from crazy_functions.高级功能函数模板 import 测试图表渲染
    #     function_plugins.update({
--- a/crazy_functions/Latex_Function.py
+++ b/crazy_functions/Latex_Function.py
@@ -3,7 +3,7 @@ from toolbox import CatchException, report_exception, update_ui_lastest_msg, zip
 from functools import partial
 from loguru import logger
-import glob, os, requests, time, json, tarfile, threading
+import glob, os, requests, time, json, tarfile
 pj = os.path.join
 ARXIV_CACHE_DIR = get_conf("ARXIV_CACHE_DIR")
@@ -138,43 +138,25 @@ def arxiv_download(chatbot, history, txt, allow_cache=True):
    cached_translation_pdf = check_cached_translation_pdf(arxiv_id)
    if cached_translation_pdf and allow_cache: return cached_translation_pdf, arxiv_id
-    extract_dst = pj(ARXIV_CACHE_DIR, arxiv_id, 'extract')
+    url_tar = url_.replace('/abs/', '/e-print/')
    translation_dir = pj(ARXIV_CACHE_DIR, arxiv_id, 'e-print')
-    dst = pj(translation_dir, arxiv_id + '.tar')
+    extract_dst = pj(ARXIV_CACHE_DIR, arxiv_id, 'extract')
    os.makedirs(translation_dir, exist_ok=True)
    # <-------------- download arxiv source file ------------->
-    def fix_url_and_download():
+    # <-------------- download arxiv source file ------------->
-        # for url_tar in [url_.replace('/abs/', '/e-print/'), url_.replace('/abs/', '/src/')]:
+    dst = pj(translation_dir, arxiv_id + '.tar')
-        for url_tar in [url_.replace('/abs/', '/src/'), url_.replace('/abs/', '/e-print/')]:
+    if os.path.exists(dst):
        yield from update_ui_lastest_msg("调用缓存", chatbot=chatbot, history=history)  # 刷新界面
    else:
        yield from update_ui_lastest_msg("开始下载", chatbot=chatbot, history=history)  # 刷新界面
        proxies = get_conf('proxies')
        r = requests.get(url_tar, proxies=proxies)
            if r.status_code == 200:
        with open(dst, 'wb+') as f:
            f.write(r.content)
                return True
        return False
    if os.path.exists(dst) and allow_cache:
        yield from update_ui_lastest_msg(f"调用缓存 {arxiv_id}", chatbot=chatbot, history=history)  # 刷新界面
        success = True
    else:
        yield from update_ui_lastest_msg(f"开始下载 {arxiv_id}", chatbot=chatbot, history=history)  # 刷新界面
        success = fix_url_and_download()
        yield from update_ui_lastest_msg(f"下载完成 {arxiv_id}", chatbot=chatbot, history=history)  # 刷新界面
    if not success:
        yield from update_ui_lastest_msg(f"下载失败 {arxiv_id}", chatbot=chatbot, history=history)
        raise tarfile.ReadError(f"论文下载失败 {arxiv_id}")
    # <-------------- extract file ------------->
    yield from update_ui_lastest_msg("下载完成", chatbot=chatbot, history=history)  # 刷新界面
    from toolbox import extract_archive
    try:
    extract_archive(file_path=dst, dest_dir=extract_dst)
    except tarfile.ReadError:
        os.remove(dst)
        raise tarfile.ReadError(f"论文下载失败")
    return extract_dst, arxiv_id
@@ -338,17 +320,11 @@ def Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot,
    # <-------------- more requirements ------------->
    if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
    more_req = plugin_kwargs.get("advanced_arg", "")
-
+    no_cache = more_req.startswith("--no-cache")
-    no_cache = ("--no-cache" in more_req)
+    if no_cache: more_req.lstrip("--no-cache")
    if no_cache: more_req = more_req.replace("--no-cache", "").strip()
    allow_gptac_cloud_io = ("--allow-cloudio" in more_req)  # 从云端下载翻译结果，以及上传翻译结果到云端
    if allow_gptac_cloud_io: more_req = more_req.replace("--allow-cloudio", "").strip()
    allow_cache = not no_cache
    _switch_prompt_ = partial(switch_prompt, more_requirement=more_req)
    # <-------------- check deps ------------->
    try:
        import glob, os, time, subprocess
@@ -375,20 +351,6 @@ def Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot,
        yield from update_ui(chatbot=chatbot, history=history)  # 刷新界面
        return
    # #################################################################
    if allow_gptac_cloud_io and arxiv_id:
        # 访问 GPTAC学术云，查询云端是否存在该论文的翻译版本
        from crazy_functions.latex_fns.latex_actions import check_gptac_cloud
        success, downloaded = check_gptac_cloud(arxiv_id, chatbot)
        if success:
            chatbot.append([
                f"检测到GPTAC云端存在翻译版本, 如果不满意翻译结果, 请禁用云端分享, 然后重新执行。", 
                None
            ])
            yield from update_ui(chatbot=chatbot, history=history)
            return
    #################################################################
    if os.path.exists(txt):
        project_folder = txt
    else:
@@ -426,21 +388,14 @@ def Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot,
    # <-------------- zip PDF ------------->
    zip_res = zip_result(project_folder)
    if success:
        if allow_gptac_cloud_io and arxiv_id:
            # 如果用户允许，我们将翻译好的arxiv论文PDF上传到GPTAC学术云
            from crazy_functions.latex_fns.latex_actions import upload_to_gptac_cloud_if_user_allow
            threading.Thread(target=upload_to_gptac_cloud_if_user_allow, 
                args=(chatbot, arxiv_id), daemon=True).start()
        chatbot.append((f"成功啦", '请查收结果（压缩包）...'))
-        yield from update_ui(chatbot=chatbot, history=history)
+        yield from update_ui(chatbot=chatbot, history=history);
        time.sleep(1)  # 刷新界面
        promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
    else:
        chatbot.append((f"失败了",
                        '虽然PDF生成失败了, 但请查收结果（压缩包）, 内含已经翻译的Tex文档, 您可以到Github Issue区, 用该压缩包进行反馈。如系统是Linux，请检查系统字体（见Github wiki） ...'))
-        yield from update_ui(chatbot=chatbot, history=history)
+        yield from update_ui(chatbot=chatbot, history=history);
        time.sleep(1)  # 刷新界面
        promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
--- a/crazy_functions/Latex_Function_Wrap.py
+++ b/crazy_functions/Latex_Function_Wrap.py
@@ -30,8 +30,6 @@ class Arxiv_Localize(GptAcademicPluginTemplate):
                            default_value="", type="string").model_dump_json(), # 高级参数输入区，自动同步
            "allow_cache":
                ArgProperty(title="是否允许从缓存中调取结果", options=["允许缓存", "从头执行"], default_value="允许缓存", description="无", type="dropdown").model_dump_json(),
            "allow_cloudio":
                ArgProperty(title="是否允许从GPTAC学术云下载(或者上传)翻译结果(仅针对Arxiv论文)", options=["允许", "禁止"], default_value="禁止", description="共享文献，互助互利", type="dropdown").model_dump_json(),
        }
        return gui_definition
@@ -40,14 +38,9 @@ class Arxiv_Localize(GptAcademicPluginTemplate):
        执行插件
        """
        allow_cache = plugin_kwargs["allow_cache"]
        allow_cloudio = plugin_kwargs["allow_cloudio"]
        advanced_arg = plugin_kwargs["advanced_arg"]
        if allow_cache == "从头执行": plugin_kwargs["advanced_arg"] = "--no-cache " + plugin_kwargs["advanced_arg"]
        # 从云端下载翻译结果，以及上传翻译结果到云端；人人为我，我为人人。
        if allow_cloudio == "允许": plugin_kwargs["advanced_arg"] = "--allow-cloudio " + plugin_kwargs["advanced_arg"]
        yield from Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request)
--- a/crazy_functions/Markdown_Translate.py
+++ b/crazy_functions/Markdown_Translate.py
@@ -65,7 +65,7 @@ def 多文件翻译(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
            pfg.file_contents.append(file_content)
    #  <-------- 拆分过长的Markdown文件 ---------->
-    pfg.run_file_split(max_token_limit=1024)
+    pfg.run_file_split(max_token_limit=2048)
    n_split = len(pfg.sp_file_contents)
    #  <-------- 多线程翻译开始 ---------->
--- a/crazy_functions/Rag_Interface.py
+++ b/crazy_functions/Rag_Interface.py
@@ -2,7 +2,20 @@ from toolbox import CatchException, update_ui, get_conf, get_log_folder, update_
 from crazy_functions.crazy_utils import input_clipping
 from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
 VECTOR_STORE_TYPE = "Milvus"
 if VECTOR_STORE_TYPE == "Milvus":
    try:
        from crazy_functions.rag_fns.milvus_worker import MilvusRagWorker as LlamaIndexRagWorker
    except:
        VECTOR_STORE_TYPE = "Simple"
 if VECTOR_STORE_TYPE == "Simple":
    from crazy_functions.rag_fns.llama_index_worker import LlamaIndexRagWorker
 RAG_WORKER_REGISTER = {}
 MAX_HISTORY_ROUND = 5
 MAX_CONTEXT_TOKEN_LIMIT = 4096
 REMEMBER_PREVIEW = 1000
@@ -10,16 +23,6 @@ REMEMBER_PREVIEW = 1000
@CatchException
 def Rag问答(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
    # import vector store lib
    VECTOR_STORE_TYPE = "Milvus"
    if VECTOR_STORE_TYPE == "Milvus":
        try:
            from crazy_functions.rag_fns.milvus_worker import MilvusRagWorker as LlamaIndexRagWorker
        except:
            VECTOR_STORE_TYPE = "Simple"
    if VECTOR_STORE_TYPE == "Simple":
        from crazy_functions.rag_fns.llama_index_worker import LlamaIndexRagWorker
    # 1. we retrieve rag worker from global context
    user_name = chatbot.get_user()
    checkpoint_dir = get_log_folder(user_name, plugin_name='experimental_rag')
--- a/crazy_functions/SourceCode_Comment.py
+++ b/crazy_functions/SourceCode_Comment.py
@@ -6,10 +6,7 @@ from crazy_functions.crazy_utils import request_gpt_model_multi_threads_with_ver
 from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
 from crazy_functions.agent_fns.python_comment_agent import PythonCodeComment
 from crazy_functions.diagram_fns.file_tree import FileNode
 from crazy_functions.agent_fns.watchdog import WatchDog
 from shared_utils.advanced_markdown_format import markdown_convertion_for_file
 from loguru import logger
 def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
@@ -27,13 +24,12 @@ def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
        file_tree_struct.add_file(file_path, file_path)
    # <第一步，逐个文件分析，多线程>
    lang = "" if not plugin_kwargs["use_chinese"] else " (you must use Chinese)"
    for index, fp in enumerate(file_manifest):
        # 读取文件
        with open(fp, 'r', encoding='utf-8', errors='replace') as f:
            file_content = f.read()
        prefix = ""
-        i_say = prefix + f'Please conclude the following source code at {os.path.relpath(fp, project_folder)} with only one sentence{lang}, the code is:\n```{file_content}```'
+        i_say = prefix + f'Please conclude the following source code at {os.path.relpath(fp, project_folder)} with only one sentence, the code is:\n```{file_content}```'
        i_say_show_user = prefix + f'[{index+1}/{len(file_manifest)}] 请用一句话对下面的程序文件做一个整体概述: {fp}'
        # 装载请求内容
        MAX_TOKEN_SINGLE_FILE = 2560
@@ -41,7 +37,7 @@ def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
        inputs_array.append(i_say)
        inputs_show_user_array.append(i_say_show_user)
        history_array.append([])
-        sys_prompt_array.append(f"You are a software architecture analyst analyzing a source code project. Do not dig into details, tell me what the code is doing in general. Your answer must be short, simple and clear{lang}.")
+        sys_prompt_array.append("You are a software architecture analyst analyzing a source code project. Do not dig into details, tell me what the code is doing in general. Your answer must be short, simple and clear.")
    # 文件读取完成，对每一个源代码文件，生成一个请求线程，发送到大模型进行分析
    gpt_response_collection = yield from request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
        inputs_array = inputs_array,
@@ -54,20 +50,10 @@ def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
    )
    # <第二步，逐个文件分析，生成带注释文件>
    tasks = ["" for _ in range(len(file_manifest))]
    def bark_fn(tasks):
        for i in range(len(tasks)): tasks[i] = "watchdog is dead"
    wd = WatchDog(timeout=10, bark_fn=lambda: bark_fn(tasks), interval=3, msg="ThreadWatcher timeout")
    wd.begin_watch()
    from concurrent.futures import ThreadPoolExecutor
    executor = ThreadPoolExecutor(max_workers=get_conf('DEFAULT_WORKER_NUM'))
-    def _task_multi_threading(i_say, gpt_say, fp, file_tree_struct, index):
+    def _task_multi_threading(i_say, gpt_say, fp, file_tree_struct):
-        language = 'Chinese' if plugin_kwargs["use_chinese"] else 'English'
+        pcc = PythonCodeComment(llm_kwargs, language='English')
        def observe_window_update(x):
            if tasks[index] == "watchdog is dead":
                raise TimeoutError("ThreadWatcher: watchdog is dead")
            tasks[index] = x
        pcc = PythonCodeComment(llm_kwargs, plugin_kwargs, language=language, observe_window_update=observe_window_update)
        pcc.read_file(path=fp, brief=gpt_say)
        revised_path, revised_content = pcc.begin_comment_source_code(None, None)
        file_tree_struct.manifest[fp].revised_path = revised_path
@@ -79,8 +65,7 @@ def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
        with open("crazy_functions/agent_fns/python_comment_compare.html", 'r', encoding='utf-8') as f:
            html_template = f.read()
        warp = lambda x: "```python\n\n" + x + "\n\n```"
-        from themes.theme import load_dynamic_theme
+        from themes.theme import advanced_css
        _, advanced_css, _, _ = load_dynamic_theme("Default")
        html_template = html_template.replace("ADVANCED_CSS", advanced_css)
        html_template = html_template.replace("REPLACE_CODE_FILE_LEFT", pcc.get_markdown_block_in_html(markdown_convertion_for_file(warp(pcc.original_content))))
        html_template = html_template.replace("REPLACE_CODE_FILE_RIGHT", pcc.get_markdown_block_in_html(markdown_convertion_for_file(warp(revised_content))))
@@ -88,21 +73,17 @@ def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
        file_tree_struct.manifest[fp].compare_html = compare_html_path
        with open(compare_html_path, 'w', encoding='utf-8') as f:
            f.write(html_template)
-        tasks[index] = ""
+        # print('done 1')
    chatbot.append([None, f"正在处理:"])
    futures = []
    index = 0
    for i_say, gpt_say, fp in zip(gpt_response_collection[0::2], gpt_response_collection[1::2], file_manifest):
-        future = executor.submit(_task_multi_threading, i_say, gpt_say, fp, file_tree_struct, index)
+        future = executor.submit(_task_multi_threading, i_say, gpt_say, fp, file_tree_struct)
        index += 1
        futures.append(future)
    # <第三步，等待任务完成>
    cnt = 0
    while True:
        cnt += 1
        wd.feed()
        time.sleep(3)
        worker_done = [h.done() for h in futures]
        remain = len(worker_done) - sum(worker_done)
@@ -111,18 +92,14 @@ def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
        preview_html_list = []
        for done, fp in zip(worker_done, file_manifest):
            if not done: continue
            if hasattr(file_tree_struct.manifest[fp], 'compare_html'):
            preview_html_list.append(file_tree_struct.manifest[fp].compare_html)
            else:
                logger.error(f"文件: {fp} 的注释结果未能成功")
        file_links = generate_file_link(preview_html_list)
        yield from update_ui_lastest_msg(
-            f"当前任务: <br/>{'<br/>'.join(tasks)}.<br/>" + 
+            f"剩余源文件数量: {remain}.\n\n" + 
-            f"剩余源文件数量: {remain}.<br/>" + 
+            f"已完成的文件: {sum(worker_done)}.\n\n" + 
            f"已完成的文件: {sum(worker_done)}.<br/>" + 
            file_links +
-            "<br/>" +
+            "\n\n" +
            ''.join(['.']*(cnt % 10 + 1)
        ), chatbot=chatbot, history=history, delay=0)
        yield from update_ui(chatbot=chatbot, history=[]) # 刷新界面
@@ -143,7 +120,6 @@ def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
@CatchException
 def 注释Python项目(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
    history = []    # 清空历史，以免输入溢出
    plugin_kwargs["use_chinese"] = plugin_kwargs.get("use_chinese", False)
    import glob, os
    if os.path.exists(txt):
        project_folder = txt
--- a/crazy_functions/SourceCode_Comment_Wrap.py
+++ b/crazy_functions/SourceCode_Comment_Wrap.py
@@ -1,36 +0,0 @@
 from toolbox import get_conf, update_ui
 from crazy_functions.plugin_template.plugin_class_template import GptAcademicPluginTemplate, ArgProperty
 from crazy_functions.SourceCode_Comment import 注释Python项目
 class SourceCodeComment_Wrap(GptAcademicPluginTemplate):
    def __init__(self):
        """
        请注意`execute`会执行在不同的线程中，因此您在定义和使用类变量时，应当慎之又慎！
        """
        pass
    def define_arg_selection_menu(self):
        """
        定义插件的二级选项菜单
        """
        gui_definition = {
            "main_input":
                ArgProperty(title="路径", description="程序路径（上传文件后自动填写）", default_value="", type="string").model_dump_json(), # 主输入，自动从输入框同步
            "use_chinese":
                ArgProperty(title="注释语言", options=["英文", "中文"], default_value="英文", description="无", type="dropdown").model_dump_json(),
            # "use_emoji":
                # ArgProperty(title="在注释中使用emoji", options=["禁止", "允许"], default_value="禁止", description="无", type="dropdown").model_dump_json(),
        }
        return gui_definition
    def execute(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
        """
        执行插件
        """
        if plugin_kwargs["use_chinese"] == "中文": 
            plugin_kwargs["use_chinese"] = True
        else: 
            plugin_kwargs["use_chinese"] = False
        yield from 注释Python项目(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request)
--- a/crazy_functions/agent_fns/python_comment_agent.py
+++ b/crazy_functions/agent_fns/python_comment_agent.py
@@ -68,7 +68,6 @@ Be aware:
 1. You must NOT modify the indent of code.
 2. You are NOT authorized to change or translate non-comment code, and you are NOT authorized to add empty lines either, toggle qu.
 3. Use {LANG} to add comments and docstrings. Do NOT translate Chinese that is already in the code.
 4. Besides adding a docstring, use the ⭐ symbol to annotate the most core and important line of code within the function, explaining its role.
 ------------------ Example ------------------
 INPUT:
@@ -117,66 +116,10 @@ def zip_result(folder):
 '''
 revise_funtion_prompt_chinese = '''
 您需要阅读以下代码，并根据以下说明修订源代码({FILE_BASENAME}):
 1. 如果源代码中包含函数的话, 你应该分析给定函数实现了什么功能
 2. 如果源代码中包含函数的话, 你需要为函数添加docstring, docstring必须使用中文
 请注意：
 1. 你不得修改代码的缩进
 2. 你无权更改或翻译代码中的非注释部分，也不允许添加空行
 3. 使用 {LANG} 添加注释和文档字符串。不要翻译代码中已有的中文
 4. 除了添加docstring之外, 使用⭐符号给该函数中最核心、最重要的一行代码添加注释，并说明其作用
 ------------------ 示例 ------------------
 INPUT:
 ```
 L0000 |
 L0001 |def zip_result(folder):
 L0002 |    t = gen_time_str()
 L0003 |    zip_folder(folder, get_log_folder(), f"result.zip")
 L0004 |    return os.path.join(get_log_folder(), f"result.zip")
 L0005 |
 L0006 |
 ```
 OUTPUT:
 <instruction_1_purpose>
 该函数用于压缩指定文件夹，并返回生成的`zip`文件的路径。
 </instruction_1_purpose>
 <instruction_2_revised_code>
 ```
 def zip_result(folder):
    """
    该函数将指定的文件夹压缩成ZIP文件, 并将其存储在日志文件夹中。
    输入参数:
        folder (str): 需要压缩的文件夹的路径。
    返回值:
        str: 日志文件夹中创建的ZIP文件的路径。
    """
    t = gen_time_str()
    zip_folder(folder, get_log_folder(), f"result.zip")  # ⭐ 执行文件夹的压缩
    return os.path.join(get_log_folder(), f"result.zip")
 ```
 </instruction_2_revised_code>
 ------------------ End of Example ------------------
 ------------------ the real INPUT you need to process NOW ({FILE_BASENAME}) ------------------
 ```
 {THE_CODE}
 ```
 {INDENT_REMINDER}
 {BRIEF_REMINDER}
 {HINT_REMINDER}
 '''
 class PythonCodeComment():
-    def __init__(self, llm_kwargs, plugin_kwargs, language, observe_window_update) -> None:
+    def __init__(self, llm_kwargs, language) -> None:
        self.original_content = ""
        self.full_context = []
        self.full_context_with_line_no = []
@@ -184,13 +127,7 @@ class PythonCodeComment():
        self.page_limit = 100 # 100 lines of code each page
        self.ignore_limit = 20
        self.llm_kwargs = llm_kwargs
        self.plugin_kwargs = plugin_kwargs
        self.language = language
        self.observe_window_update = observe_window_update
        if self.language == "chinese":
            self.core_prompt = revise_funtion_prompt_chinese
        else:
            self.core_prompt = revise_funtion_prompt
        self.path = None
        self.file_basename = None
        self.file_brief = ""
@@ -321,7 +258,7 @@ class PythonCodeComment():
        hint_reminder = "" if hint is None else f"(Reminder: do not ignore or modify code such as `{hint}`, provide complete code in the OUTPUT.)"
        self.llm_kwargs['temperature'] = 0
        result = predict_no_ui_long_connection(
-            inputs=self.core_prompt.format(
+            inputs=revise_funtion_prompt.format(
                LANG=self.language, 
                FILE_BASENAME=self.file_basename, 
                THE_CODE=code, 
@@ -411,7 +348,6 @@ class PythonCodeComment():
            try:
                # yield from update_ui_lastest_msg(f"({self.file_basename}) 正在读取下一段代码片段:\n", chatbot=chatbot, history=history, delay=0)
                next_batch, line_no_start, line_no_end = self.get_next_batch()
                self.observe_window_update(f"正在处理{self.file_basename} - {line_no_start}/{len(self.full_context)}\n")
                # yield from update_ui_lastest_msg(f"({self.file_basename}) 处理代码片段:\n\n{next_batch}", chatbot=chatbot, history=history, delay=0)
                hint = None
--- a/crazy_functions/ast_fns/comment_remove.py
+++ b/crazy_functions/ast_fns/comment_remove.py
@@ -1,47 +1,39 @@
-import token
+import ast
-import tokenize
+
-import copy
+class CommentRemover(ast.NodeTransformer):
-import io
+    def visit_FunctionDef(self, node):
        # 移除函数的文档字符串
        if (node.body and isinstance(node.body[0], ast.Expr) and
                isinstance(node.body[0].value, ast.Str)):
            node.body = node.body[1:]
        self.generic_visit(node)
        return node
    def visit_ClassDef(self, node):
        # 移除类的文档字符串
        if (node.body and isinstance(node.body[0], ast.Expr) and
                isinstance(node.body[0].value, ast.Str)):
            node.body = node.body[1:]
        self.generic_visit(node)
        return node
    def visit_Module(self, node):
        # 移除模块的文档字符串
        if (node.body and isinstance(node.body[0], ast.Expr) and
                isinstance(node.body[0].value, ast.Str)):
            node.body = node.body[1:]
        self.generic_visit(node)
        return node
-def remove_python_comments(input_source: str) -> str:
+def remove_python_comments(source_code):
-    source_flag = copy.copy(input_source)
+    # 解析源代码为 AST
-    source = io.StringIO(input_source)
+    tree = ast.parse(source_code)
-    ls = input_source.split('\n')
+    # 移除注释
-    prev_toktype = token.INDENT
+    transformer = CommentRemover()
-    readline = source.readline
+    tree = transformer.visit(tree)
-
+    # 将处理后的 AST 转换回源代码
-    def get_char_index(lineno, col):
+    return ast.unparse(tree)
        # find the index of the char in the source code
        if lineno == 1:
            return len('\n'.join(ls[:(lineno-1)])) + col
        else:
            return len('\n'.join(ls[:(lineno-1)])) + col + 1
    def replace_char_between(start_lineno, start_col, end_lineno, end_col, source, replace_char, ls):
        # replace char between start_lineno, start_col and end_lineno, end_col with replace_char, but keep '\n' and ' '
        b = get_char_index(start_lineno, start_col)
        e = get_char_index(end_lineno, end_col)
        for i in range(b, e):
            if source[i] == '\n':
                source = source[:i] + '\n' + source[i+1:]
            elif source[i] == ' ':
                source = source[:i] + ' ' + source[i+1:]
            else:
                source = source[:i] + replace_char + source[i+1:]
        return source
    tokgen = tokenize.generate_tokens(readline)
    for toktype, ttext, (slineno, scol), (elineno, ecol), ltext in tokgen:
        if toktype == token.STRING and (prev_toktype == token.INDENT):
            source_flag = replace_char_between(slineno, scol, elineno, ecol, source_flag, ' ', ls)
        elif toktype == token.STRING and (prev_toktype == token.NEWLINE):
            source_flag = replace_char_between(slineno, scol, elineno, ecol, source_flag, ' ', ls)
        elif toktype == tokenize.COMMENT:
            source_flag = replace_char_between(slineno, scol, elineno, ecol, source_flag, ' ', ls)
        prev_toktype = toktype
    return source_flag
 # 示例使用
 if __name__ == "__main__":
--- a/crazy_functions/latex_fns/latex_actions.py
+++ b/crazy_functions/latex_fns/latex_actions.py
@@ -3,7 +3,7 @@ import re
 import shutil
 import numpy as np
 from loguru import logger
-from toolbox import update_ui, update_ui_lastest_msg, get_log_folder, gen_time_str
+from toolbox import update_ui, update_ui_lastest_msg, get_log_folder
 from toolbox import get_conf, promote_file_to_downloadzone
 from crazy_functions.latex_fns.latex_toolbox import PRESERVE, TRANSFORM
 from crazy_functions.latex_fns.latex_toolbox import set_forbidden_text, set_forbidden_text_begin_end, set_forbidden_text_careful_brace
@@ -468,70 +468,3 @@ def write_html(sp_file_contents, sp_file_result, chatbot, project_folder):
    except:
        from toolbox import trimmed_format_exc
        logger.error('writing html result failed:', trimmed_format_exc())
 def upload_to_gptac_cloud_if_user_allow(chatbot, arxiv_id):
    try:
        # 如果用户允许，我们将arxiv论文PDF上传到GPTAC学术云
        from toolbox import map_file_to_sha256
        # 检查是否顺利，如果没有生成预期的文件，则跳过
        is_result_good = False
        for file_path in chatbot._cookies.get("files_to_promote", []):
            if file_path.endswith('translate_zh.pdf'):
                is_result_good = True
        if not is_result_good:
            return
        # 上传文件
        for file_path in chatbot._cookies.get("files_to_promote", []):
            align_name = None
            # normalized name
            for name in ['translate_zh.pdf', 'comparison.pdf']:
                if file_path.endswith(name): align_name = name
            # if match any align name
            if align_name:
                logger.info(f'Uploading to GPTAC cloud as the user has set `allow_cloud_io`: {file_path}')
                with open(file_path, 'rb') as f:
                    import requests
                    url = 'https://cloud-2.agent-matrix.com/arxiv_tf_paper_normal_upload'
                    files = {'file': (align_name, f, 'application/octet-stream')}
                    data = {
                        'arxiv_id': arxiv_id,
                        'file_hash': map_file_to_sha256(file_path),
                        'language': 'zh',
                        'trans_prompt': 'to_be_implemented',
                        'llm_model': 'to_be_implemented',
                        'llm_model_param': 'to_be_implemented',
                    }
                    resp = requests.post(url=url, files=files, data=data, timeout=30)
                logger.info(f'Uploading terminate ({resp.status_code})`: {file_path}')
    except:
        # 如果上传失败，不会中断程序，因为这是次要功能
        pass
 def check_gptac_cloud(arxiv_id, chatbot):
    import requests
    success = False
    downloaded = []
    try:
        for pdf_target in ['translate_zh.pdf', 'comparison.pdf']:
            url = 'https://cloud-2.agent-matrix.com/arxiv_tf_paper_normal_exist'
            data = {
                'arxiv_id': arxiv_id,
                'name': pdf_target,
            }
            resp = requests.post(url=url, data=data)
            cache_hit_result = resp.text.strip('"')
            if cache_hit_result.startswith("http"):
                url = cache_hit_result
                logger.info(f'Downloading from GPTAC cloud: {url}')
                resp = requests.get(url=url, timeout=30)
                target = os.path.join(get_log_folder(plugin_name='gptac_cloud'), gen_time_str(), pdf_target)
                os.makedirs(os.path.dirname(target), exist_ok=True)
                with open(target, 'wb') as f:
                    f.write(resp.content)
                new_path = promote_file_to_downloadzone(target, chatbot=chatbot)
                success = True
                downloaded.append(new_path)
    except:
        pass
    return success, downloaded
--- a/crazy_functions/latex_fns/latex_pickle_io.py
+++ b/crazy_functions/latex_fns/latex_pickle_io.py
@@ -6,16 +6,12 @@ class SafeUnpickler(pickle.Unpickler):
    def get_safe_classes(self):
        from crazy_functions.latex_fns.latex_actions import LatexPaperFileGroup, LatexPaperSplit
        from crazy_functions.latex_fns.latex_toolbox import LinkedListNode
        from numpy.core.multiarray import scalar
        from numpy import dtype
        # 定义允许的安全类
        safe_classes = {
            # 在这里添加其他安全的类
            'LatexPaperFileGroup': LatexPaperFileGroup,
            'LatexPaperSplit': LatexPaperSplit,
            'LinkedListNode': LinkedListNode,
            'scalar': scalar,
            'dtype': dtype,
        }
        return safe_classes
@@ -26,6 +22,8 @@ class SafeUnpickler(pickle.Unpickler):
        for class_name in self.safe_classes.keys():
            if (class_name in f'{module}.{name}'):
                match_class_name = class_name
        if module == 'numpy' or module.startswith('numpy.'):
            return super().find_class(module, name)
        if match_class_name is not None:
            return self.safe_classes[match_class_name]
        # 如果尝试加载未授权的类，则抛出异常
--- a/crazy_functions/latex_fns/latex_toolbox.py
+++ b/crazy_functions/latex_fns/latex_toolbox.py
@@ -644,15 +644,6 @@ def run_in_subprocess(func):
 def _merge_pdfs(pdf1_path, pdf2_path, output_path):
    try:
        logger.info("Merging PDFs using _merge_pdfs_ng")
        _merge_pdfs_ng(pdf1_path, pdf2_path, output_path)
    except:
        logger.info("Merging PDFs using _merge_pdfs_legacy")
        _merge_pdfs_legacy(pdf1_path, pdf2_path, output_path)
 def _merge_pdfs_ng(pdf1_path, pdf2_path, output_path):
    import PyPDF2  # PyPDF2这个库有严重的内存泄露问题，把它放到子进程中运行，从而方便内存的释放
    from PyPDF2.generic import NameObject, TextStringObject,ArrayObject,FloatObject,NumberObject
@@ -697,206 +688,65 @@ def _merge_pdfs_ng(pdf1_path, pdf2_path, output_path):
                    ),
                    0,
                )
-                if "/Annots" in new_page:
+                if '/Annots' in page1:
-                    annotations = new_page["/Annots"]
+                    page1_annot_id = [annot.idnum for annot in page1['/Annots']]
                else:
                    page1_annot_id = []
                if '/Annots' in page2:
                    page2_annot_id = [annot.idnum for annot in page2['/Annots']]
                else:
                    page2_annot_id = []
                if '/Annots' in new_page:
                    annotations = new_page['/Annots']
                    for i, annot in enumerate(annotations):
                        annot_obj = annot.get_object()
                        # 检查注释类型是否是链接（/Link）
-                        if annot_obj.get("/Subtype") == "/Link":
+                        if annot_obj.get('/Subtype') == '/Link':
                            # 检查是否为内部链接跳转（/GoTo）或外部URI链接（/URI）
-                            action = annot_obj.get("/A")
+                            action = annot_obj.get('/A')
                            if action:
-                                if "/S" in action and action["/S"] == "/GoTo":
+                                if '/S' in action and action['/S'] == '/GoTo':
                                    # 内部链接：跳转到文档中的某个页面
-                                    dest = action.get("/D")  # 目标页或目标位置
+                                    dest = action.get('/D')  # 目标页或目标位置
-                                    # if dest and annot.idnum in page2_annot_id:
+                                    if dest and annot.idnum in page2_annot_id:
                                    # if dest in pdf2_reader.named_destinations:
                                    if dest and page2.annotations:
                                        if annot in page2.annotations:
                                        # 获取原始文件中跳转信息，包括跳转页面
-                                            destination = pdf2_reader.named_destinations[
+                                        destination = pdf2_reader.named_destinations[dest]
-                                                dest
+                                        page_number = pdf2_reader.get_destination_page_number(destination)
                                            ]
                                            page_number = (
                                                pdf2_reader.get_destination_page_number(
                                                    destination
                                                )
                                            )
                                        #更新跳转信息，跳转到对应的页面和，指定坐标 (100, 150)，缩放比例为 100%
                                        #“/D”:[10,'/XYZ',100,100,0]
-                                            if destination.dest_array[1] == "/XYZ":
+                                        annot_obj['/A'].update({
-                                                annot_obj["/A"].update(
+                                            NameObject("/D"): ArrayObject([NumberObject(page_number),destination.dest_array[1], FloatObject(destination.dest_array[2] + int(page1.mediaBox.getWidth())) ,destination.dest_array[3],destination.dest_array[4]])  # 确保键和值是 PdfObject
-                                                    {
+                                        })
-                                                        NameObject("/D"): ArrayObject(
+                                        rect = annot_obj.get('/Rect')
                                                            [
                                                                NumberObject(page_number),
                                                                destination.dest_array[1],
                                                                FloatObject(
                                                                    destination.dest_array[
                                                                        2
                                                                    ]
                                                                    + int(
                                                                        page1.mediaBox.getWidth()
                                                                    )
                                                                ),
                                                                destination.dest_array[3],
                                                                destination.dest_array[4],
                                                            ]
                                                        )  # 确保键和值是 PdfObject
                                                    }
                                                )
                                            else:
                                                annot_obj["/A"].update(
                                                    {
                                                        NameObject("/D"): ArrayObject(
                                                            [
                                                                NumberObject(page_number),
                                                                destination.dest_array[1],
                                                            ]
                                                        )  # 确保键和值是 PdfObject
                                                    }
                                                )
                                            rect = annot_obj.get("/Rect")
                                        # 更新点击坐标
-                                            rect = ArrayObject(
+                                        rect = ArrayObject([FloatObject(rect[0]+ int(page1.mediaBox.getWidth())),rect[1],
-                                                [
+                                                            FloatObject(rect[2]+int(page1.mediaBox.getWidth())),rect[3] ])
-                                                    FloatObject(
+                                        annot_obj.update({
-                                                        rect[0]
+                                            NameObject("/Rect"): rect  # 确保键和值是 PdfObject
-                                                        + int(page1.mediaBox.getWidth())
+                                        })
-                                                    ),
+                                    if dest and annot.idnum in page1_annot_id:
                                                    rect[1],
                                                    FloatObject(
                                                        rect[2]
                                                        + int(page1.mediaBox.getWidth())
                                                    ),
                                                    rect[3],
                                                ]
                                            )
                                            annot_obj.update(
                                                {
                                                    NameObject(
                                                        "/Rect"
                                                    ): rect  # 确保键和值是 PdfObject
                                                }
                                            )
                                    # if dest and annot.idnum in page1_annot_id:
                                    # if dest in pdf1_reader.named_destinations:
                                    if dest and page1.annotations:
                                        if annot in page1.annotations:
                                        # 获取原始文件中跳转信息，包括跳转页面
-                                            destination = pdf1_reader.named_destinations[
+                                        destination = pdf1_reader.named_destinations[dest]
-                                                dest
+                                        page_number = pdf1_reader.get_destination_page_number(destination)
                                            ]
                                            page_number = (
                                                pdf1_reader.get_destination_page_number(
                                                    destination
                                                )
                                            )
                                        #更新跳转信息，跳转到对应的页面和，指定坐标 (100, 150)，缩放比例为 100%
                                        #“/D”:[10,'/XYZ',100,100,0]
-                                            if destination.dest_array[1] == "/XYZ":
+                                        annot_obj['/A'].update({
-                                                annot_obj["/A"].update(
+                                            NameObject("/D"): ArrayObject([NumberObject(page_number),destination.dest_array[1], FloatObject(destination.dest_array[2]) ,destination.dest_array[3],destination.dest_array[4]])  # 确保键和值是 PdfObject
-                                                    {
+                                        })
-                                                        NameObject("/D"): ArrayObject(
+                                        rect = annot_obj.get('/Rect')
-                                                            [
+                                        rect = ArrayObject([FloatObject(rect[0]),rect[1],
-                                                                NumberObject(page_number),
+                                                            FloatObject(rect[2]),rect[3] ])
-                                                                destination.dest_array[1],
+                                        annot_obj.update({
-                                                                FloatObject(
+                                            NameObject("/Rect"): rect  # 确保键和值是 PdfObject
-                                                                    destination.dest_array[
+                                        })
                                                                        2
                                                                    ]
                                                                ),
                                                                destination.dest_array[3],
                                                                destination.dest_array[4],
                                                            ]
                                                        )  # 确保键和值是 PdfObject
                                                    }
                                                )
                                            else:
                                                annot_obj["/A"].update(
                                                    {
                                                        NameObject("/D"): ArrayObject(
                                                            [
                                                                NumberObject(page_number),
                                                                destination.dest_array[1],
                                                            ]
                                                        )  # 确保键和值是 PdfObject
                                                    }
                                                )
-                                            rect = annot_obj.get("/Rect")
+                                elif '/S' in action and action['/S'] == '/URI':
                                            rect = ArrayObject(
                                                [
                                                    FloatObject(rect[0]),
                                                    rect[1],
                                                    FloatObject(rect[2]),
                                                    rect[3],
                                                ]
                                            )
                                            annot_obj.update(
                                                {
                                                    NameObject(
                                                        "/Rect"
                                                    ): rect  # 确保键和值是 PdfObject
                                                }
                                            )
                                elif "/S" in action and action["/S"] == "/URI":
                                    # 外部链接：跳转到某个URI
-                                    uri = action.get("/URI")
+                                    uri = action.get('/URI')
                output_writer.addPage(new_page)
            # Save the merged PDF file
            with open(output_path, "wb") as output_file:
                output_writer.write(output_file)
 def _merge_pdfs_legacy(pdf1_path, pdf2_path, output_path):
    import PyPDF2  # PyPDF2这个库有严重的内存泄露问题，把它放到子进程中运行，从而方便内存的释放
    Percent = 0.95
    # raise RuntimeError('PyPDF2 has a serious memory leak problem, please use other tools to merge PDF files.')
    # Open the first PDF file
    with open(pdf1_path, "rb") as pdf1_file:
        pdf1_reader = PyPDF2.PdfFileReader(pdf1_file)
        # Open the second PDF file
        with open(pdf2_path, "rb") as pdf2_file:
            pdf2_reader = PyPDF2.PdfFileReader(pdf2_file)
            # Create a new PDF file to store the merged pages
            output_writer = PyPDF2.PdfFileWriter()
            # Determine the number of pages in each PDF file
            num_pages = max(pdf1_reader.numPages, pdf2_reader.numPages)
            # Merge the pages from the two PDF files
            for page_num in range(num_pages):
                # Add the page from the first PDF file
                if page_num < pdf1_reader.numPages:
                    page1 = pdf1_reader.getPage(page_num)
                else:
                    page1 = PyPDF2.PageObject.createBlankPage(pdf1_reader)
                # Add the page from the second PDF file
                if page_num < pdf2_reader.numPages:
                    page2 = pdf2_reader.getPage(page_num)
                else:
                    page2 = PyPDF2.PageObject.createBlankPage(pdf1_reader)
                # Create a new empty page with double width
                new_page = PyPDF2.PageObject.createBlankPage(
                    width=int(
                        int(page1.mediaBox.getWidth())
                        + int(page2.mediaBox.getWidth()) * Percent
                    ),
                    height=max(page1.mediaBox.getHeight(), page2.mediaBox.getHeight()),
                )
                new_page.mergeTranslatedPage(page1, 0, 0)
                new_page.mergeTranslatedPage(
                    page2,
                    int(
                        int(page1.mediaBox.getWidth())
                        - int(page2.mediaBox.getWidth()) * (1 - Percent)
                    ),
                    0,
                )
                output_writer.addPage(new_page)
            # Save the merged PDF file
            with open(output_path, "wb") as output_file:
--- a/crazy_functions/pdf_fns/parse_pdf_via_doc2x.py
+++ b/crazy_functions/pdf_fns/parse_pdf_via_doc2x.py
@@ -4,9 +4,7 @@ from toolbox import promote_file_to_downloadzone, extract_archive
 from toolbox import generate_file_link, zip_folder
 from crazy_functions.crazy_utils import get_files_from_everything
 from shared_utils.colorful import *
 from loguru import logger
 import os
 import time
 def refresh_key(doc2x_api_key):
    import requests, json
@@ -24,140 +22,105 @@ def refresh_key(doc2x_api_key):
        raise RuntimeError(format("[ERROR] status code: %d, body: %s" % (res.status_code, res.text)))
    return doc2x_api_key
 def 解析PDF_DOC2X_转Latex(pdf_file_path):
    zip_file_path, unzipped_folder = 解析PDF_DOC2X(pdf_file_path, format='tex')
    return unzipped_folder
 def 解析PDF_DOC2X(pdf_file_path, format='tex'):
    """
        format: 'tex', 'md', 'docx'
    """
    import requests, json, os
    DOC2X_API_KEY = get_conf('DOC2X_API_KEY')
    latex_dir = get_log_folder(plugin_name="pdf_ocr_latex")
    markdown_dir = get_log_folder(plugin_name="pdf_ocr")
    doc2x_api_key = DOC2X_API_KEY
    if doc2x_api_key.startswith('sk-'):
        url = "https://api.doc2x.noedgeai.com/api/v1/pdf"
    else:
        doc2x_api_key = refresh_key(doc2x_api_key)
        url = "https://api.doc2x.noedgeai.com/api/platform/pdf"
    # < ------ 第1步：上传 ------ >
    logger.info("Doc2x 第1步：上传")
    with open(pdf_file_path, 'rb') as file:
    res = requests.post(
-            "https://v2.doc2x.noedgeai.com/api/v2/parse/pdf",
+        url,
-            headers={"Authorization": "Bearer " + doc2x_api_key},
+        files={"file": open(pdf_file_path, "rb")},
-            data=file
+        data={"ocr": "1"},
        headers={"Authorization": "Bearer " + doc2x_api_key}
    )
-    # res_json = []
+    res_json = []
    if res.status_code == 200:
-        res_json = res.json()
+        decoded = res.content.decode("utf-8")
        for z_decoded in decoded.split('\n'):
            if len(z_decoded) == 0: continue
            assert z_decoded.startswith("data: ")
            z_decoded = z_decoded[len("data: "):]
            decoded_json = json.loads(z_decoded)
            res_json.append(decoded_json)
    else:
-        raise RuntimeError(f"Doc2x return an error: {res.json()}")
+        raise RuntimeError(format("[ERROR] status code: %d, body: %s" % (res.status_code, res.text)))
    uuid = res_json['data']['uid']
-    # < ------ 第2步：轮询等待 ------ >
+    uuid = res_json[0]['uuid']
-    logger.info("Doc2x 第2步：轮询等待")
+    to = "latex" # latex, md, docx
-    params = {'uid': uuid}
+    url = "https://api.doc2x.noedgeai.com/api/export"+"?request_id="+uuid+"&to="+to
    while True:
        res = requests.get(
            'https://v2.doc2x.noedgeai.com/api/v2/parse/status',
            headers={"Authorization": "Bearer " + doc2x_api_key},
            params=params
        )
        res_json = res.json()
        if res_json['data']['status'] == "success":
            break
        elif res_json['data']['status'] == "processing":
            time.sleep(3)
            logger.info(f"Doc2x is processing at {res_json['data']['progress']}%")
        elif res_json['data']['status'] == "failed":
            raise RuntimeError(f"Doc2x return an error: {res_json}")
-
+    res = requests.get(url, headers={"Authorization": "Bearer " + doc2x_api_key})
-    # < ------ 第3步：提交转化 ------ >
+    latex_zip_path = os.path.join(latex_dir, gen_time_str() + '.zip')
-    logger.info("Doc2x 第3步：提交转化")
+    latex_unzip_path = os.path.join(latex_dir, gen_time_str())
    data = {
        "uid": uuid,
        "to": format,
        "formula_mode": "dollar",
        "filename": "output"
    }
    res = requests.post(
        'https://v2.doc2x.noedgeai.com/api/v2/convert/parse',
        headers={"Authorization": "Bearer " + doc2x_api_key},
        json=data
    )
    if res.status_code == 200:
-        res_json = res.json()
+        with open(latex_zip_path, "wb") as f: f.write(res.content)
    else:
-        raise RuntimeError(f"Doc2x return an error: {res.json()}")
+        raise RuntimeError(format("[ERROR] status code: %d, body: %s" % (res.status_code, res.text)))
    # < ------ 第4步：等待结果 ------ >
    logger.info("Doc2x 第4步：等待结果")
    params = {'uid': uuid}
    while True:
        res = requests.get(
            'https://v2.doc2x.noedgeai.com/api/v2/convert/parse/result',
            headers={"Authorization": "Bearer " + doc2x_api_key},
            params=params
        )
        res_json = res.json()
        if res_json['data']['status'] == "success":
            break
        elif res_json['data']['status'] == "processing":
            time.sleep(3)
            logger.info(f"Doc2x still processing")
        elif res_json['data']['status'] == "failed":
            raise RuntimeError(f"Doc2x return an error: {res_json}")
    # < ------ 第5步：最后的处理 ------ >
    logger.info("Doc2x 第5步：最后的处理")
    if format=='tex':
        target_path = latex_dir
    if format=='md':
        target_path = markdown_dir
    os.makedirs(target_path, exist_ok=True)
    max_attempt = 3
    # < ------ 下载 ------ >
    for attempt in range(max_attempt):
        try:
            result_url = res_json['data']['url']
            res = requests.get(result_url)
            zip_path = os.path.join(target_path, gen_time_str() + '.zip')
            unzip_path = os.path.join(target_path, gen_time_str())
            if res.status_code == 200:
                with open(zip_path, "wb") as f: f.write(res.content)
            else:
                raise RuntimeError(f"Doc2x return an error: {res.json()}")
        except Exception as e:
            if attempt < max_attempt - 1:
                logger.error(f"Failed to download latex file, retrying... {e}")
                time.sleep(3)
                continue
            else:
                raise e
    # < ------ 解压 ------ >
    import zipfile
-    with zipfile.ZipFile(zip_path, 'r') as zip_ref:
+    with zipfile.ZipFile(latex_zip_path, 'r') as zip_ref:
-        zip_ref.extractall(unzip_path)
+        zip_ref.extractall(latex_unzip_path)
-    return zip_path, unzip_path
+
    return latex_unzip_path
 def 解析PDF_DOC2X_单文件(fp, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, DOC2X_API_KEY, user_request):
    def pdf2markdown(filepath):
-        chatbot.append((None, f"Doc2x 解析中"))
+        import requests, json, os
        markdown_dir = get_log_folder(plugin_name="pdf_ocr")
        doc2x_api_key = DOC2X_API_KEY
        if doc2x_api_key.startswith('sk-'):
            url = "https://api.doc2x.noedgeai.com/api/v1/pdf"
        else:
            doc2x_api_key = refresh_key(doc2x_api_key)
            url = "https://api.doc2x.noedgeai.com/api/platform/pdf"
        chatbot.append((None, "加载PDF文件，发送至DOC2X解析..."))
        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
-        md_zip_path, unzipped_folder = 解析PDF_DOC2X(filepath, format='md')
+        res = requests.post(
            url,
            files={"file": open(filepath, "rb")},
            data={"ocr": "1"},
            headers={"Authorization": "Bearer " + doc2x_api_key}
        )
        res_json = []
        if res.status_code == 200:
            decoded = res.content.decode("utf-8")
            for z_decoded in decoded.split('\n'):
                if len(z_decoded) == 0: continue
                assert z_decoded.startswith("data: ")
                z_decoded = z_decoded[len("data: "):]
                decoded_json = json.loads(z_decoded)
                res_json.append(decoded_json)
            if 'limit exceeded' in decoded_json.get('status', ''):
                raise RuntimeError("Doc2x API 页数受限，请联系 Doc2x 方面，并更换新的 API 秘钥。")
        else:
            raise RuntimeError(format("[ERROR] status code: %d, body: %s" % (res.status_code, res.text)))
        uuid = res_json[0]['uuid']
        to = "md" # latex, md, docx
        url = "https://api.doc2x.noedgeai.com/api/export"+"?request_id="+uuid+"&to="+to
        chatbot.append((None, f"读取解析: {url} ..."))
        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
        res = requests.get(url, headers={"Authorization": "Bearer " + doc2x_api_key})
        md_zip_path = os.path.join(markdown_dir, gen_time_str() + '.zip')
        if res.status_code == 200:
            with open(md_zip_path, "wb") as f: f.write(res.content)
        else:
            raise RuntimeError(format("[ERROR] status code: %d, body: %s" % (res.status_code, res.text)))
        promote_file_to_downloadzone(md_zip_path, chatbot=chatbot)
        chatbot.append((None, f"完成解析 {md_zip_path} ..."))
        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -180,7 +180,6 @@ version: '3'
 services:
  gpt_academic_with_latex:
    image: ghcr.io/binary-husky/gpt_academic_with_latex:master  # (Auto Built by Dockerfile: docs/GithubAction+NoLocal+Latex)
    # 对于ARM64设备，请将以上镜像名称替换为 ghcr.io/binary-husky/gpt_academic_with_latex_arm:master
    environment:
      # 请查阅 `config.py` 以查看所有的配置信息
      API_KEY:                  '    sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                              '
--- a/docs/Dockerfile+JittorLLM
+++ b/docs/Dockerfile+JittorLLM
@@ -0,0 +1 @@
 # 此Dockerfile不再维护，请前往docs/GithubAction+JittorLLMs
--- a/docs/GithubAction+AllCapacityBeta
+++ b/docs/GithubAction+AllCapacityBeta
@@ -0,0 +1,57 @@
 # docker build -t gpt-academic-all-capacity -f docs/GithubAction+AllCapacity  --network=host --build-arg http_proxy=http://localhost:10881 --build-arg https_proxy=http://localhost:10881 .
 # docker build -t gpt-academic-all-capacity -f docs/GithubAction+AllCapacityBeta  --network=host .
 # docker run -it --net=host gpt-academic-all-capacity  bash
 # 从NVIDIA源，从而支持显卡（检查宿主的nvidia-smi中的cuda版本必须>=11.3）
 FROM fuqingxu/11.3.1-runtime-ubuntu20.04-with-texlive:latest
 # edge-tts需要的依赖，某些pip包所需的依赖
 RUN apt update && apt install ffmpeg build-essential -y
 # use python3 as the system default python
 WORKDIR /gpt
 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.8
 # # 非必要步骤，更换pip源 （以下三行，可以删除）
 # RUN echo '[global]' > /etc/pip.conf && \
 #     echo 'index-url = https://mirrors.aliyun.com/pypi/simple/' >> /etc/pip.conf && \
 #     echo 'trusted-host = mirrors.aliyun.com' >> /etc/pip.conf
 # 下载pytorch
 RUN python3 -m pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu113
 # 准备pip依赖
 RUN python3 -m pip install openai numpy arxiv rich
 RUN python3 -m pip install colorama Markdown pygments pymupdf
 RUN python3 -m pip install python-docx moviepy pdfminer
 RUN python3 -m pip install zh_langchain==0.2.1 pypinyin
 RUN python3 -m pip install rarfile py7zr
 RUN python3 -m pip install aliyun-python-sdk-core==2.13.3 pyOpenSSL webrtcvad scipy git+https://github.com/aliyun/alibabacloud-nls-python-sdk.git
 # 下载分支
 WORKDIR /gpt
 RUN git clone --depth=1 https://github.com/binary-husky/gpt_academic.git
 WORKDIR /gpt/gpt_academic
 RUN git clone --depth=1 https://github.com/OpenLMLab/MOSS.git request_llms/moss
 RUN python3 -m pip install -r requirements.txt
 RUN python3 -m pip install -r request_llms/requirements_moss.txt
 RUN python3 -m pip install -r request_llms/requirements_qwen.txt
 RUN python3 -m pip install -r request_llms/requirements_chatglm.txt
 RUN python3 -m pip install -r request_llms/requirements_newbing.txt
 RUN python3 -m pip install nougat-ocr
 # 预热Tiktoken模块
 RUN python3  -c 'from check_proxy import warm_up_modules; warm_up_modules()'
 # 安装知识库插件的额外依赖
 RUN apt-get update && apt-get install libgl1 -y
 RUN pip3 install transformers protobuf langchain sentence-transformers  faiss-cpu nltk beautifulsoup4 bitsandbytes tabulate icetk --upgrade
 RUN pip3 install unstructured[all-docs] --upgrade
 RUN python3  -c 'from check_proxy import warm_up_vectordb; warm_up_vectordb()'
 RUN rm -rf /usr/local/lib/python3.8/dist-packages/tests
 # COPY .cache /root/.cache
 # COPY config_private.py config_private.py
 # 启动
 CMD ["python3", "-u", "main.py"]
--- a/docs/GithubAction+NoLocal+Latex
+++ b/docs/GithubAction+NoLocal+Latex
@@ -1,31 +1,32 @@
-# 此Dockerfile适用于"无本地模型"的环境构建，如果需要使用chatglm等本地模型，请参考 docs/Dockerfile+ChatGLM
+# 此Dockerfile适用于“无本地模型”的环境构建，如果需要使用chatglm等本地模型，请参考 docs/Dockerfile+ChatGLM
 # - 1 修改 `config.py`
 # - 2 构建 docker build -t gpt-academic-nolocal-latex -f docs/GithubAction+NoLocal+Latex .
 # - 3 运行 docker run -v /home/fuqingxu/arxiv_cache:/root/arxiv_cache --rm -it --net=host gpt-academic-nolocal-latex
-FROM menghuan1918/ubuntu_uv_ctex:latest
+FROM fuqingxu/python311_texlive_ctex:latest
-ENV DEBIAN_FRONTEND=noninteractive
+ENV PATH "$PATH:/usr/local/texlive/2022/bin/x86_64-linux"
-SHELL ["/bin/bash", "-c"]
+ENV PATH "$PATH:/usr/local/texlive/2023/bin/x86_64-linux"
 ENV PATH "$PATH:/usr/local/texlive/2024/bin/x86_64-linux"
 ENV PATH "$PATH:/usr/local/texlive/2025/bin/x86_64-linux"
 ENV PATH "$PATH:/usr/local/texlive/2026/bin/x86_64-linux"
 # 指定路径
 WORKDIR /gpt
-# 先复制依赖文件
+RUN pip3 install openai numpy arxiv rich
-COPY requirements.txt .
+RUN pip3 install colorama Markdown pygments pymupdf
 RUN pip3 install python-docx pdfminer
 RUN pip3 install nougat-ocr
 # 装载项目文件
 COPY . .
 # 安装依赖
-RUN pip install --break-system-packages openai numpy arxiv rich colorama Markdown pygments pymupdf python-docx pdfminer \
+RUN pip3 install -r requirements.txt
    && pip install --break-system-packages -r requirements.txt \
    && if [ "$(uname -m)" = "x86_64" ]; then \
    pip install --break-system-packages nougat-ocr; \
    fi \
    && pip cache purge \
    && rm -rf /root/.cache/pip/*
-# 创建非root用户
+# edge-tts需要的依赖
-RUN useradd -m gptuser && chown -R gptuser /gpt
+RUN apt update && apt install ffmpeg -y
 USER gptuser
 # 最后才复制代码文件,这样代码更新时只需重建最后几层，可以大幅减少docker pull所需的大小
 COPY --chown=gptuser:gptuser . .
 # 可选步骤，用于预热模块
 RUN python3  -c 'from check_proxy import warm_up_modules; warm_up_modules()'
--- a/request_llms/bridge_all.py
+++ b/request_llms/bridge_all.py
@@ -256,8 +256,6 @@ model_info = {
        "max_token": 128000,
        "tokenizer": tokenizer_gpt4,
        "token_cnt": get_token_num_gpt4,
        "openai_disable_system_prompt": True,
        "openai_disable_stream": True,
    },
    "o1-mini": {
        "fn_with_ui": chatgpt_ui,
@@ -266,8 +264,6 @@ model_info = {
        "max_token": 128000,
        "tokenizer": tokenizer_gpt4,
        "token_cnt": get_token_num_gpt4,
        "openai_disable_system_prompt": True,
        "openai_disable_stream": True,
    },
    "gpt-4-turbo": {
@@ -385,14 +381,6 @@ model_info = {
        "tokenizer": tokenizer_gpt35,
        "token_cnt": get_token_num_gpt35,
    },
    "glm-4-plus":{
        "fn_with_ui": zhipu_ui,
        "fn_without_ui": zhipu_noui,
        "endpoint": None,
        "max_token": 10124 * 8,
        "tokenizer": tokenizer_gpt35,
        "token_cnt": get_token_num_gpt35,
    },
    # api_2d (此后不需要在此处添加api2d的接口了，因为下面的代码会自动添加)
    "api2d-gpt-4": {
@@ -1293,3 +1281,4 @@ def predict(inputs:str, llm_kwargs:dict, plugin_kwargs:dict, chatbot,
    # 更新一下llm_kwargs的参数，否则会出现参数不匹配的问题
    yield from method(inputs, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, stream, additional_fn)
--- a/request_llms/bridge_chatgpt.py
+++ b/request_llms/bridge_chatgpt.py
@@ -202,13 +202,10 @@ def predict_no_ui_long_connection(inputs:str, llm_kwargs:dict, history:list=[],
                    if (time.time()-observe_window[1]) > watch_dog_patience:
                        raise RuntimeError("用户取消了程序。")
        else: raise RuntimeError("意外Json结构："+delta)
-
+    if json_data and json_data['finish_reason'] == 'content_filter':
-    finish_reason = json_data.get('finish_reason', None) if json_data else None
+        raise RuntimeError("由于提问含不合规内容被Azure过滤。")
-    if finish_reason == 'content_filter':
+    if json_data and json_data['finish_reason'] == 'length':
        raise RuntimeError("由于提问含不合规内容被过滤。")
    if finish_reason == 'length':
        raise ConnectionAbortedError("正常结束，但显示Token不足，导致输出不完整，请削减单次输入的文本量。")
    return result
@@ -341,7 +338,7 @@ def predict(inputs:str, llm_kwargs:dict, plugin_kwargs:dict, chatbot:ChatBotWith
                    # 前者是API2D的结束条件，后者是OPENAI的结束条件
                    if ('data: [DONE]' in chunk_decoded) or (len(chunkjson['choices'][0]["delta"]) == 0):
                        # 判定为数据流的结束，gpt_replying_buffer也写完了
-                        log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer, user_name=chatbot.get_user())
+                        log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer)
                        break
                    # 处理数据流的主体
                    status_text = f"finish_reason: {chunkjson['choices'][0].get('finish_reason', 'null')}"
@@ -375,7 +372,7 @@ def handle_o1_model_special(response, inputs, llm_kwargs, chatbot, history):
    try:
        chunkjson = json.loads(response.content.decode())
        gpt_replying_buffer = chunkjson['choices'][0]["message"]["content"]
-        log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer, user_name=chatbot.get_user())
+        log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer)
        history[-1] = gpt_replying_buffer
        chatbot[-1] = (history[-2], history[-1])
        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
@@ -539,3 +536,4 @@ def generate_payload(inputs:str, llm_kwargs:dict, history:list, system_prompt:st
    return headers,payload
--- a/request_llms/bridge_chatgpt_vision.py
+++ b/request_llms/bridge_chatgpt_vision.py
@@ -184,7 +184,7 @@ def predict(inputs, llm_kwargs, plugin_kwargs, chatbot, history=[], system_promp
                        # 判定为数据流的结束，gpt_replying_buffer也写完了
                        lastmsg = chatbot[-1][-1] + f"\n\n\n\n「{llm_kwargs['llm_model']}调用结束，该模型不具备上下文对话能力，如需追问，请及时切换模型。」"
                        yield from update_ui_lastest_msg(lastmsg, chatbot, history, delay=1)
-                        log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer, user_name=chatbot.get_user())
+                        log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer)
                        break
                    # 处理数据流的主体
                    status_text = f"finish_reason: {chunkjson['choices'][0].get('finish_reason', 'null')}"
--- a/request_llms/bridge_claude.py
+++ b/request_llms/bridge_claude.py
@@ -216,7 +216,7 @@ def predict(inputs, llm_kwargs, plugin_kwargs, chatbot, history=[], system_promp
                if need_to_pass:
                    pass
                elif is_last_chunk:
-                    log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer, user_name=chatbot.get_user())
+                    log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer)
                    # logger.info(f'[response] {gpt_replying_buffer}')
                    break
                else:
--- a/request_llms/bridge_cohere.py
+++ b/request_llms/bridge_cohere.py
@@ -223,7 +223,7 @@ def predict(inputs:str, llm_kwargs:dict, plugin_kwargs:dict, chatbot:ChatBotWith
                        chatbot[-1] = (history[-2], history[-1])
                        yield from update_ui(chatbot=chatbot, history=history, msg="正常") # 刷新界面
                    if chunkjson['event_type'] == 'stream-end':
-                        log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer, user_name=chatbot.get_user())
+                        log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer)
                        history[-1] = gpt_replying_buffer
                        chatbot[-1] = (history[-2], history[-1])
                        yield from update_ui(chatbot=chatbot, history=history, msg="正常") # 刷新界面
--- a/request_llms/bridge_google_gemini.py
+++ b/request_llms/bridge_google_gemini.py
@@ -109,7 +109,7 @@ def predict(inputs:str, llm_kwargs:dict, plugin_kwargs:dict, chatbot:ChatBotWith
            gpt_replying_buffer += paraphrase['text']    # 使用 json 解析库进行处理
            chatbot[-1] = (inputs, gpt_replying_buffer)
            history[-1] = gpt_replying_buffer
-            log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer, user_name=chatbot.get_user())
+            log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer)
            yield from update_ui(chatbot=chatbot, history=history)
        if error_match:
            history = history[-2]  # 错误的不纳入对话
--- a/request_llms/bridge_moonshot.py
+++ b/request_llms/bridge_moonshot.py
@@ -166,7 +166,7 @@ def predict(inputs:str, llm_kwargs:dict, plugin_kwargs:dict, chatbot:ChatBotWith
            history = history[:-2]
            yield from update_ui(chatbot=chatbot, history=history)  # 刷新界面
            break
-    log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_bro_result, user_name=chatbot.get_user())
+    log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_bro_result)
 def predict_no_ui_long_connection(inputs, llm_kwargs, history=[], sys_prompt="", observe_window=None,
                                  console_slience=False):
--- a/request_llms/bridge_openrouter.py
+++ b/request_llms/bridge_openrouter.py
@@ -337,7 +337,7 @@ def predict(inputs:str, llm_kwargs:dict, plugin_kwargs:dict, chatbot:ChatBotWith
                    # 前者是API2D的结束条件，后者是OPENAI的结束条件
                    if ('data: [DONE]' in chunk_decoded) or (len(chunkjson['choices'][0]["delta"]) == 0):
                        # 判定为数据流的结束，gpt_replying_buffer也写完了
-                        log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer, user_name=chatbot.get_user())
+                        log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer)
                        break
                    # 处理数据流的主体
                    status_text = f"finish_reason: {chunkjson['choices'][0].get('finish_reason', 'null')}"
@@ -371,7 +371,7 @@ def handle_o1_model_special(response, inputs, llm_kwargs, chatbot, history):
    try:
        chunkjson = json.loads(response.content.decode())
        gpt_replying_buffer = chunkjson['choices'][0]["message"]["content"]
-        log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer, user_name=chatbot.get_user())
+        log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer)
        history[-1] = gpt_replying_buffer
        chatbot[-1] = (history[-2], history[-1])
        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
--- a/request_llms/bridge_qwen.py
+++ b/request_llms/bridge_qwen.py
@@ -59,7 +59,7 @@ def predict(inputs, llm_kwargs, plugin_kwargs, chatbot, history=[], system_promp
        chatbot[-1] = (inputs, response)
        yield from update_ui(chatbot=chatbot, history=history)
-    log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=response, user_name=chatbot.get_user())
+    log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=response)
    # 总结输出
    if response == f"[Local Message] 等待{model_name}响应中 ...":
        response = f"[Local Message] {model_name}响应异常 ..."
--- a/request_llms/bridge_taichu.py
+++ b/request_llms/bridge_taichu.py
@@ -68,5 +68,5 @@ def predict(inputs:str, llm_kwargs:dict, plugin_kwargs:dict, chatbot:ChatBotWith
        chatbot[-1] = [inputs, response]
        yield from update_ui(chatbot=chatbot, history=history)
    history.extend([inputs, response])
-    log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=response, user_name=chatbot.get_user())
+    log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=response)
    yield from update_ui(chatbot=chatbot, history=history)
--- a/request_llms/bridge_zhipu.py
+++ b/request_llms/bridge_zhipu.py
@@ -97,5 +97,5 @@ def predict(inputs:str, llm_kwargs:dict, plugin_kwargs:dict, chatbot:ChatBotWith
        chatbot[-1] = [inputs, response]
        yield from update_ui(chatbot=chatbot, history=history)
    history.extend([inputs, response])
-    log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=response, user_name=chatbot.get_user())
+    log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=response)
    yield from update_ui(chatbot=chatbot, history=history)
--- a/requirements.txt
+++ b/requirements.txt
@@ -2,15 +2,14 @@ https://public.agent-matrix.com/publish/gradio-3.32.10-py3-none-any.whl
 fastapi==0.110
 gradio-client==0.8
 pypdf2==2.12.1
 httpx<=0.25.2
 zhipuai==2.0.1
 tiktoken>=0.3.3
 requests[socks]
-pydantic==2.9.2
+pydantic==2.5.2
 llama-index~=0.10
 protobuf==3.20
 transformers>=4.27.1,<4.42
 scipdf_parser>=0.52
 spacy==3.7.4
 anthropic>=0.18.1
 python-markdown-math
 pymdown-extensions
@@ -33,14 +32,3 @@ loguru
 arxiv
 numpy
 rich
 llama-index-core==0.10.68
 llama-index-legacy==0.9.48
 llama-index-readers-file==0.1.33
 llama-index-readers-llama-parse==0.1.6
 llama-index-embeddings-azure-openai==0.1.10
 llama-index-embeddings-openai==0.1.10
 llama-parse==0.4.9
 mdit-py-plugins>=0.3.3
 linkify-it-py==2.0.3
--- a/shared_utils/fastapi_server.py
+++ b/shared_utils/fastapi_server.py
@@ -138,9 +138,7 @@ def start_app(app_block, CONCURRENT_COUNT, AUTHENTICATION, PORT, SSL_KEYFILE, SS
    app_block.is_sagemaker = False
    gradio_app = App.create_app(app_block)
-    for route in list(gradio_app.router.routes):
+
        if route.path == "/proxy={url_path:path}":
            gradio_app.router.routes.remove(route)
    # --- --- replace gradio endpoint to forbid access to sensitive files --- ---
    if len(AUTHENTICATION) > 0:
        dependencies = []
@@ -156,13 +154,9 @@ def start_app(app_block, CONCURRENT_COUNT, AUTHENTICATION, PORT, SSL_KEYFILE, SS
        @gradio_app.head("/file={path_or_url:path}", dependencies=dependencies)
        @gradio_app.get("/file={path_or_url:path}", dependencies=dependencies)
        async def file(path_or_url: str, request: fastapi.Request):
            if len(AUTHENTICATION) > 0:
                if not _authorize_user(path_or_url, request, gradio_app):
                    return "越权访问!"
            stripped = path_or_url.lstrip().lower()
            if stripped.startswith("https://") or stripped.startswith("http://"):
                return "账户密码授权模式下, 禁止链接!"
            if '../' in stripped:
                return "非法路径!"
            return await endpoint(path_or_url, request)
        from fastapi import Request, status
@@ -173,26 +167,6 @@ def start_app(app_block, CONCURRENT_COUNT, AUTHENTICATION, PORT, SSL_KEYFILE, SS
            response.delete_cookie('access-token')
            response.delete_cookie('access-token-unsecure')
            return response
    else:
        dependencies = []
        endpoint = None
        for route in list(gradio_app.router.routes):
            if route.path == "/file/{path:path}":
                gradio_app.router.routes.remove(route)
            if route.path == "/file={path_or_url:path}":
                dependencies = route.dependencies
                endpoint = route.endpoint
                gradio_app.router.routes.remove(route)
        @gradio_app.get("/file/{path:path}", dependencies=dependencies)
        @gradio_app.head("/file={path_or_url:path}", dependencies=dependencies)
        @gradio_app.get("/file={path_or_url:path}", dependencies=dependencies)
        async def file(path_or_url: str, request: fastapi.Request):
            stripped = path_or_url.lstrip().lower()
            if stripped.startswith("https://") or stripped.startswith("http://"):
                return "账户密码授权模式下, 禁止链接!"
            if '../' in stripped:
                return "非法路径!"
            return await endpoint(path_or_url, request)
    # --- --- enable TTS (text-to-speech) functionality --- ---
    TTS_TYPE = get_conf("TTS_TYPE")
--- a/shared_utils/handle_upload.py
+++ b/shared_utils/handle_upload.py
@@ -104,7 +104,6 @@ def extract_archive(file_path, dest_dir):
            logger.info("Successfully extracted zip archive to {}".format(dest_dir))
    elif file_extension in [".tar", ".gz", ".bz2"]:
        try:
        with tarfile.open(file_path, "r:*") as tarobj:
            # 清理提取路径，移除任何不安全的元素
            for member in tarobj.getmembers():
@@ -116,15 +115,6 @@ def extract_archive(file_path, dest_dir):
            tarobj.extractall(path=dest_dir)
            logger.info("Successfully extracted tar archive to {}".format(dest_dir))
        except tarfile.ReadError as e:
            if file_extension == ".gz":
                # 一些特别奇葩的项目，是一个gz文件，里面不是tar，只有一个tex文件
                import gzip
                with gzip.open(file_path, 'rb') as f_in:
                    with open(os.path.join(dest_dir, 'main.tex'), 'wb') as f_out:
                        f_out.write(f_in.read())
            else:
                raise e
    # 第三方库，需要预先pip install rarfile
    # 此外，Windows上还需要安装winrar软件，配置其Path环境变量，如"C:\Program Files\WinRAR"才可以
--- a/shared_utils/key_pattern_manager.py
+++ b/shared_utils/key_pattern_manager.py
@@ -14,7 +14,6 @@ openai_regex = re.compile(
    r"sk-[a-zA-Z0-9_-]{92}$|" +
    r"sk-proj-[a-zA-Z0-9_-]{48}$|"+
    r"sk-proj-[a-zA-Z0-9_-]{124}$|"+
    r"sk-proj-[a-zA-Z0-9_-]{156}$|"+ #新版apikey位数不匹配故修改此正则表达式
    r"sess-[a-zA-Z0-9]{40}$"
 )
 def is_openai_api_key(key):
--- a/tests/test_anim_gen.py
+++ b/tests/test_anim_gen.py
@@ -1,12 +0,0 @@
 """
 对项目中的各个插件进行测试。运行方法：直接运行 python tests/test_plugins.py
 """
 import init_test
 import os, sys
 if __name__ == "__main__":
    from test_utils import plugin_test
    plugin_test(plugin='crazy_functions.数学动画生成manim->动画生成', main_input="A point moving along function culve y=sin(x), starting from x=0 and stop at x=4*\pi.")
--- a/tests/test_doc2x.py
+++ b/tests/test_doc2x.py
@@ -1,7 +0,0 @@
 import init_test
 from crazy_functions.pdf_fns.parse_pdf_via_doc2x import 解析PDF_DOC2X_转Latex
 # 解析PDF_DOC2X_转Latex("gpt_log/arxiv_cache_old/2410.10819/workfolder/merge.pdf")
 # 解析PDF_DOC2X_转Latex("gpt_log/arxiv_cache_ooo/2410.07095/workfolder/merge.pdf")
 解析PDF_DOC2X_转Latex("2410.11190v2.pdf")
--- a/toolbox.py
+++ b/toolbox.py
@@ -1029,7 +1029,7 @@ def check_repeat_upload(new_pdf_path, pdf_hash):
    # 如果所有页的内容都相同，返回 True
    return False, None
-def log_chat(llm_model: str, input_str: str, output_str: str, user_name: str=default_user_name):
+def log_chat(llm_model: str, input_str: str, output_str: str):
    try:
        if output_str and input_str and llm_model:
            uid = str(uuid.uuid4().hex)
@@ -1038,8 +1038,8 @@ def log_chat(llm_model: str, input_str: str, output_str: str, user_name: str=def
            logger.bind(chat_msg=True).info(dedent(
            """
            ╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
-            [UID/USER]
+            [UID]
-            {uid}/{user_name}
+            {uid}
            [Model]
            {llm_model}
            [Query]
@@ -1047,6 +1047,6 @@ def log_chat(llm_model: str, input_str: str, output_str: str, user_name: str=def
            [Response]
            {output_str}
            ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
-            """).format(uid=uid, user_name=user_name, llm_model=llm_model, input_str=input_str, output_str=output_str))
+            """).format(uid=uid, llm_model=llm_model, input_str=input_str, output_str=output_str))
    except:
        logger.error(trimmed_format_exc())
--- a/4
+++ b/4
@@ -1,5 +1,5 @@
 {
-  "version": 3.90,
+  "version": 3.83,
  "show_feature": true,
-  "new_feature": "增加RAG组件 <-> 升级多合一主提交键"
+  "new_feature": "增加欢迎页面 <-> 优化图像生成插件 <-> 添加紫东太初大模型支持 <-> 保留主题选择 <-> 支持更复杂的插件框架 <-> 上传文件时显示进度条"
 }
作者	SHA1	备注	提交日期
binary-husky	7415d532d1	solve the pdf concate error	2024-10-13 07:36:36 +00:00
binary-husky	97eef45ab7	Merge branch 'frontier' of github.com:binary-husky/chatgpt_academic into frontier	2024-10-01 11:59:14 +00:00
binary-husky	0c0e2acb9b	remove logging extra	2024-10-01 11:57:47 +00:00
Ren Lifei	9fba8e0142	Added some modules to support openrouter (#1975 ) * Added some modules for supporting openrouter model Added some modules for supporting openrouter model * Update config.py * Update .gitignore * Update bridge_openrouter.py * Not changed actually * Refactor logging in bridge_openrouter.py --------- Co-authored-by: binary-husky <qingxu.fu@outlook.com>	2024-09-28 18:05:34 +08:00
binary-husky	7d7867fb64	remove comment	2024-09-23 15:16:13 +00:00
binary-husky	f9dbaa39fb	Merge branch 'frontier' of github.com:binary-husky/chatgpt_academic into frontier	2024-09-21 15:40:24 +00:00
binary-husky	bbc2288c5b	relax llama index version	2024-09-21 15:40:10 +00:00
Steven Moder	64ab916838	fix: loguru argument error with proxy enabled (#1977 )	2024-09-21 23:32:00 +08:00
binary-husky	8fe559da9f	update translation matrix	2024-09-21 14:56:10 +00:00
binary-husky	09fd22091a	fix: console output	2024-09-21 14:41:36 +00:00
binary-husky	e296719b23	Merge branch 'purge_print' into frontier	2024-09-16 09:56:25 +00:00
binary-husky	2f343179a2	logging -> loguru: final stage	2024-09-15 15:51:51 +00:00
binary-husky	4d9604f2e9	update social helper	2024-09-15 15:16:36 +00:00
binary-husky	bbf9e9f868	logging -> loguru stage 4	2024-09-14 16:00:09 +00:00
binary-husky	aa1f967dd7	support o1-preview and o1-mini	2024-09-13 03:11:53 +00:00
binary-husky	0d082327c8	logging -> loguru: stage 3	2024-09-11 08:49:55 +00:00
binary-husky	80acd9c875	import loguru: stage 2	2024-09-11 08:18:01 +00:00
binary-husky	17cd4f8210	logging sys to loguru: stage 1 complete	2024-09-11 03:30:30 +00:00
		`@@ -0,0 +1 @@`
							`# 此Dockerfile不再维护，请前往docs/GithubAction+JittorLLMs`