up

Default prompt word count control
2025-12-06 14:36:48 +00:00 · 2024-11-06 00:47:56 +08:00 · 2024-11-05 02:08:12 +08:00 · 2024-11-03 23:05:02 +08:00 · 2024-11-03 22:54:19 +08:00 · 2024-11-03 22:49:29 +08:00
--- a/.github/workflows/build-with-jittorllms.yml
+++ b/.github/workflows/build-with-jittorllms.yml
@@ -1,44 +0,0 @@
 # https://docs.github.com/en/actions/publishing-packages/publishing-docker-images#publishing-images-to-github-packages
 name: build-with-jittorllms
 on:
  push:
    branches:
      - 'master'
 env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}_jittorllms
 jobs:
  build-and-push-image:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
    steps:
      - name: Checkout repository
        uses: actions/checkout@v3
      - name: Log in to the Container registry
        uses: docker/login-action@v2
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      - name: Extract metadata (tags, labels) for Docker
        id: meta
        uses: docker/metadata-action@v4
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
      - name: Build and push Docker image
        uses: docker/build-push-action@v4
        with:
          context: .
          push: true
          file: docs/GithubAction+JittorLLMs
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
--- a/.github/workflows/build-with-all-capacity-beta.yml
+++ b/.github/workflows/build-with-all-capacity-beta.yml
@@ -1,14 +1,14 @@
 # https://docs.github.com/en/actions/publishing-packages/publishing-docker-images#publishing-images-to-github-packages
-name: build-with-all-capacity-beta
+name: build-with-latex-arm
 on:
  push:
    branches:
-      - 'master'
+      - "master"
 env:
  REGISTRY: ghcr.io
-  IMAGE_NAME: ${{ github.repository }}_with_all_capacity_beta
+  IMAGE_NAME: ${{ github.repository }}_with_latex_arm
 jobs:
  build-and-push-image:
@@ -18,11 +18,17 @@ jobs:
      packages: write
    steps:
      - name: Set up QEMU
        uses: docker/setup-qemu-action@v3
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      - name: Checkout repository
-        uses: actions/checkout@v3
+        uses: actions/checkout@v4
      - name: Log in to the Container registry
-        uses: docker/login-action@v2
+        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
@@ -35,10 +41,11 @@ jobs:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
      - name: Build and push Docker image
-        uses: docker/build-push-action@v4
+        uses: docker/build-push-action@v6
        with:
          context: .
          push: true
-          file: docs/GithubAction+AllCapacityBeta
+          platforms: linux/arm64
          file: docs/GithubAction+NoLocal+Latex
          tags: ${{ steps.meta.outputs.tags }}
-          labels: ${{ steps.meta.outputs.labels }}
+          labels: ${{ steps.meta.outputs.labels }}
--- a/README.md
+++ b/README.md
@@ -1,5 +1,6 @@
 > [!IMPORTANT]
-> 2024.6.1: 版本3.80加入插件二级菜单功能（详见wiki）  
+> 2024.10.10: 突发停电，紧急恢复了提供[whl包](https://drive.google.com/file/d/19U_hsLoMrjOlQSzYS3pzWX9fTzyusArP/view?usp=sharing)的文件服务器  
 > 2024.10.8: 版本3.90加入对llama-index的初步支持，版本3.80加入插件二级菜单功能（详见wiki）  
 > 2024.5.1: 加入Doc2x翻译PDF论文的功能，[查看详情](https://github.com/binary-husky/gpt_academic/wiki/Doc2x)  
 > 2024.3.11: 全力支持Qwen、GLM、DeepseekCoder等中文大语言模型！ SoVits语音克隆模块，[查看详情](https://www.bilibili.com/video/BV1Rp421S7tF/) 
 > 2024.1.17: 安装依赖时，请选择`requirements.txt`中**指定的版本**。 安装命令：`pip install -r requirements.txt`。本项目完全开源免费，您可通过订阅[在线服务](https://github.com/binary-husky/gpt_academic/wiki/online)的方式鼓励本项目的发展。
--- a/check_proxy.py
+++ b/check_proxy.py
@@ -1,24 +1,36 @@
 from loguru import logger
 def check_proxy(proxies, return_ip=False):
    """
    检查代理配置并返回结果。
    Args:
        proxies (dict): 包含http和https代理配置的字典。
        return_ip (bool, optional): 是否返回代理的IP地址。默认为False。
    Returns:
        str or None: 检查的结果信息或代理的IP地址（如果`return_ip`为True）。
    """
    import requests
    proxies_https = proxies['https'] if proxies is not None else '无'
    ip = None
    try:
-        response = requests.get("https://ipapi.co/json/", proxies=proxies, timeout=4)
+        response = requests.get("https://ipapi.co/json/", proxies=proxies, timeout=4)  # ⭐ 执行GET请求以获取代理信息
        data = response.json()
        if 'country_name' in data:
            country = data['country_name']
            result = f"代理配置 {proxies_https}, 代理所在地：{country}"
-            if 'ip' in data: ip = data['ip']
+            if 'ip' in data:
                ip = data['ip']
        elif 'error' in data:
-            alternative, ip = _check_with_backup_source(proxies)
+            alternative, ip = _check_with_backup_source(proxies)  # ⭐ 调用备用方法检查代理配置
            if alternative is None:
                result = f"代理配置 {proxies_https}, 代理所在地：未知，IP查询频率受限"
            else:
                result = f"代理配置 {proxies_https}, 代理所在地：{alternative}"
        else:
            result = f"代理配置 {proxies_https}, 代理数据解析失败：{data}"
        if not return_ip:
            logger.warning(result)
            return result
@@ -33,17 +45,33 @@ def check_proxy(proxies, return_ip=False):
            return ip
 def _check_with_backup_source(proxies):
    """
    通过备份源检查代理，并获取相应信息。
    Args:
        proxies (dict): 包含代理信息的字典。
    Returns:
        tuple: 代理信息(geo)和IP地址(ip)的元组。
    """
    import random, string, requests
    random_string = ''.join(random.choices(string.ascii_letters + string.digits, k=32))
    try:
-        res_json = requests.get(f"http://{random_string}.edns.ip-api.com/json", proxies=proxies, timeout=4).json()
+        res_json = requests.get(f"http://{random_string}.edns.ip-api.com/json", proxies=proxies, timeout=4).json()  # ⭐ 执行代理检查和备份源请求
        return res_json['dns']['geo'], res_json['dns']['ip']
    except:
        return None, None
 def backup_and_download(current_version, remote_version):
    """
-    一键更新协议：备份和下载
+    一键更新协议：备份当前版本，下载远程版本并解压缩。
    Args:
        current_version (str): 当前版本号。
        remote_version (str): 远程版本号。
    Returns:
        str: 新版本目录的路径。
    """
    from toolbox import get_conf
    import shutil
@@ -60,7 +88,7 @@ def backup_and_download(current_version, remote_version):
    proxies = get_conf('proxies')
    try:    r = requests.get('https://github.com/binary-husky/chatgpt_academic/archive/refs/heads/master.zip', proxies=proxies, stream=True)
    except: r = requests.get('https://public.agent-matrix.com/publish/master.zip', proxies=proxies, stream=True)
-    zip_file_path = backup_dir+'/master.zip'
+    zip_file_path = backup_dir+'/master.zip'  # ⭐ 保存备份文件的路径
    with open(zip_file_path, 'wb+') as f:
        f.write(r.content)
    dst_path = new_version_dir
@@ -76,6 +104,17 @@ def backup_and_download(current_version, remote_version):
 def patch_and_restart(path):
    """
    一键更新协议：覆盖和重启
    Args:
        path (str): 新版本代码所在的路径
    注意事项:
        如果您的程序没有使用config_private.py私密配置文件，则会将config.py重命名为config_private.py以避免配置丢失。
    更新流程:
        - 复制最新版本代码到当前目录
        - 更新pip包依赖
        - 如果更新失败，则提示手动安装依赖库并重启
    """
    from distutils import dir_util
    import shutil
@@ -84,32 +123,43 @@ def patch_and_restart(path):
    import time
    import glob
    from shared_utils.colorful import log亮黄, log亮绿, log亮红
-    # if not using config_private, move origin config.py as config_private.py
+
    if not os.path.exists('config_private.py'):
        log亮黄('由于您没有设置config_private.py私密配置，现将您的现有配置移动至config_private.py以防止配置丢失，',
              '另外您可以随时在history子文件夹下找回旧版的程序。')
        shutil.copyfile('config.py', 'config_private.py')
    path_new_version = glob.glob(path + '/*-master')[0]
-    dir_util.copy_tree(path_new_version, './')
+    dir_util.copy_tree(path_new_version, './')  # ⭐ 将最新版本代码复制到当前目录
    log亮绿('代码已经更新，即将更新pip包依赖……')
    for i in reversed(range(5)): time.sleep(1); log亮绿(i)
    try:
        import subprocess
        subprocess.check_call([sys.executable, '-m', 'pip', 'install', '-r', 'requirements.txt'])
    except:
        log亮红('pip包依赖安装出现问题，需要手动安装新增的依赖库 `python -m pip install -r requirements.txt`，然后在用常规的`python main.py`的方式启动。')
    log亮绿('更新完成，您可以随时在history子文件夹下找回旧版的程序，5s之后重启')
    log亮红('假如重启失败，您可能需要手动安装新增的依赖库 `python -m pip install -r requirements.txt`，然后在用常规的`python main.py`的方式启动。')
    log亮绿(' ------------------------------ -----------------------------------')
    for i in reversed(range(8)): time.sleep(1); log亮绿(i)
-    os.execl(sys.executable, sys.executable, *sys.argv)
+    os.execl(sys.executable, sys.executable, *sys.argv)  # 重启程序
 def get_current_version():
    """
    获取当前的版本号。
    Returns:
        str: 当前的版本号。如果无法获取版本号，则返回空字符串。
    """
    import json
    try:
        with open('./version', 'r', encoding='utf8') as f:
-            current_version = json.loads(f.read())['version']
+            current_version = json.loads(f.read())['version']  # ⭐ 从读取的json数据中提取版本号
    except:
        current_version = ""
    return current_version
@@ -118,6 +168,12 @@ def get_current_version():
 def auto_update(raise_error=False):
    """
    一键更新协议：查询版本和用户意见
    Args:
        raise_error (bool, optional): 是否在出错时抛出错误。默认为 False。
    Returns:
        None
    """
    try:
        from toolbox import get_conf
@@ -137,13 +193,13 @@ def auto_update(raise_error=False):
            current_version = json.loads(current_version)['version']
        if (remote_version - current_version) >= 0.01-1e-5:
            from shared_utils.colorful import log亮黄
-            log亮黄(f'\n新版本可用。新版本:{remote_version}，当前版本:{current_version}。{new_feature}')
+            log亮黄(f'\n新版本可用。新版本:{remote_version}，当前版本:{current_version}。{new_feature}')  # ⭐ 在控制台打印新版本信息
            logger.info('（1）Github更新地址:\nhttps://github.com/binary-husky/chatgpt_academic\n')
            user_instruction = input('（2）是否一键更新代码（Y+回车=确认，输入其他/无输入+回车=不更新）？')
            if user_instruction in ['Y', 'y']:
-                path = backup_and_download(current_version, remote_version)
+                path = backup_and_download(current_version, remote_version)  # ⭐ 备份并下载文件
                try:
-                    patch_and_restart(path)
+                    patch_and_restart(path)  # ⭐ 执行覆盖并重启操作
                except:
                    msg = '更新失败。'
                    if raise_error:
@@ -163,6 +219,9 @@ def auto_update(raise_error=False):
        logger.info(msg)
 def warm_up_modules():
    """
    预热模块，加载特定模块并执行预热操作。
    """
    logger.info('正在执行一些模块的预热 ...')
    from toolbox import ProxyNetworkActivate
    from request_llms.bridge_all import model_info
@@ -173,6 +232,16 @@ def warm_up_modules():
        enc.encode("模块预热", disallowed_special=())
 def warm_up_vectordb():
    """
    执行一些模块的预热操作。
    本函数主要用于执行一些模块的预热操作，确保在后续的流程中能够顺利运行。
    ⭐ 关键作用：预热模块
    Returns:
        None
    """
    logger.info('正在执行一些模块的预热 ...')
    from toolbox import ProxyNetworkActivate
    with ProxyNetworkActivate("Warmup_Modules"):
@@ -185,4 +254,4 @@ if __name__ == '__main__':
    os.environ['no_proxy'] = '*'  # 避免代理网络产生意外污染
    from toolbox import get_conf
    proxies = get_conf('proxies')
-    check_proxy(proxies)
+    check_proxy(proxies)
--- a/config.py
+++ b/config.py
@@ -57,9 +57,9 @@ EMBEDDING_MODEL = "text-embedding-3-small"
 #   "yi-34b-chat-0205","yi-34b-chat-200k","yi-large","yi-medium","yi-spark","yi-large-turbo","yi-large-preview",
 # ]
 # --- --- --- ---
-# 此外，您还可以在接入one-api/vllm/ollama时，
+# 此外，您还可以在接入one-api/vllm/ollama/Openroute时，
-# 使用"one-api-*","vllm-*","ollama-*"前缀直接使用非标准方式接入的模型，例如
+# 使用"one-api-*","vllm-*","ollama-*","openrouter-*"前缀直接使用非标准方式接入的模型，例如
-# AVAIL_LLM_MODELS = ["one-api-claude-3-sonnet-20240229(max_token=100000)", "ollama-phi3(max_token=4096)"]
+# AVAIL_LLM_MODELS = ["one-api-claude-3-sonnet-20240229(max_token=100000)", "ollama-phi3(max_token=4096)","openrouter-openai/gpt-4o-mini","openrouter-openai/chatgpt-4o-latest"]
 # --- --- --- ---
--- a/core_functional.py
+++ b/core_functional.py
@@ -17,7 +17,7 @@ def get_core_functions():
                            text_show_english=
                                r"Below is a paragraph from an academic paper. Polish the writing to meet the academic style, "
                                r"improve the spelling, grammar, clarity, concision and overall readability. When necessary, rewrite the whole sentence. "
-                                r"Firstly, you should provide the polished paragraph. "
+                                r"Firstly, you should provide the polished paragraph (in English). "
                                r"Secondly, you should list all your modification and explain the reasons to do so in markdown table.",
                            text_show_chinese=
                                r"作为一名中文学术论文写作改进助理，你的任务是改进所提供文本的拼写、语法、清晰、简洁和整体可读性，"
--- a/crazy_functional.py
+++ b/crazy_functional.py
@@ -6,7 +6,6 @@ from loguru import logger
 def get_crazy_functions():
    from crazy_functions.读文章写摘要 import 读文章写摘要
    from crazy_functions.生成函数注释 import 批量生成函数注释
    from crazy_functions.Rag_Interface import Rag问答
    from crazy_functions.SourceCode_Analyse import 解析项目本身
    from crazy_functions.SourceCode_Analyse import 解析一个Python项目
    from crazy_functions.SourceCode_Analyse import 解析一个Matlab项目
@@ -22,13 +21,13 @@ def get_crazy_functions():
    from crazy_functions.询问多个大语言模型 import 同时问询
    from crazy_functions.SourceCode_Analyse import 解析一个Lua项目
    from crazy_functions.SourceCode_Analyse import 解析一个CSharp项目
    from crazy_functions.总结word文档 import 总结word文档
    from crazy_functions.解析JupyterNotebook import 解析ipynb文件
    from crazy_functions.Conversation_To_File import 载入对话历史存档
    from crazy_functions.Conversation_To_File import 对话历史存档
    from crazy_functions.Conversation_To_File import Conversation_To_File_Wrap
    from crazy_functions.Conversation_To_File import 删除所有本地对话历史记录
    from crazy_functions.辅助功能 import 清除缓存
    from crazy_functions.批量文件询问 import 批量文件询问
    from crazy_functions.Markdown_Translate import Markdown英译中
    from crazy_functions.批量总结PDF文档 import 批量总结PDF文档
    from crazy_functions.PDF_Translate import 批量翻译PDF文档
@@ -50,15 +49,9 @@ def get_crazy_functions():
    from crazy_functions.Image_Generate import 图片生成_DALLE2, 图片生成_DALLE3, 图片修改_DALLE2
    from crazy_functions.Image_Generate_Wrap import ImageGen_Wrap
    from crazy_functions.SourceCode_Comment import 注释Python项目
    from crazy_functions.SourceCode_Comment_Wrap import SourceCodeComment_Wrap
    function_plugins = {
        "Rag智能召回": {
            "Group": "对话",
            "Color": "stop",
            "AsButton": False,
            "Info": "将问答数据记录到向量库中，作为长期参考。",
            "Function": HotReload(Rag问答),
        },
        "虚空终端": {
            "Group": "对话|编程|学术|智能体",
            "Color": "stop",
@@ -79,6 +72,7 @@ def get_crazy_functions():
            "AsButton": False,
            "Info": "上传一系列python源文件(或者压缩包), 为这些代码添加docstring | 输入参数为路径",
            "Function": HotReload(注释Python项目),
            "Class": SourceCodeComment_Wrap,
        },
        "载入对话历史存档（先上传存档或输入路径）": {
            "Group": "对话",
@@ -116,12 +110,13 @@ def get_crazy_functions():
            "Function": HotReload(Latex翻译中文并重新编译PDF),  # 当注册Class后，Function旧接口仅会在“虚空终端”中起作用
            "Class": Arxiv_Localize,    # 新一代插件需要注册Class
        },
-        "批量总结Word文档": {
+        "批量文件询问": {
            "Group": "学术",
            "Color": "stop",
            "AsButton": False,
-            "Info": "批量总结word文档 | 输入参数为路径",
+            "AdvancedArgs": True,
-            "Function": HotReload(总结word文档),
+            "Info": "通过在高级参数区写入prompt，可自定义询问逻辑，默认情况下为总结逻辑 | 输入参数为路径",
            "Function": HotReload(批量文件询问),
        },
        "解析整个Matlab项目": {
            "Group": "编程",
@@ -707,6 +702,31 @@ def get_crazy_functions():
        logger.error(trimmed_format_exc())
        logger.error("Load function plugin failed")
    try:
        from crazy_functions.Rag_Interface import Rag问答
        function_plugins.update(
            {
                "Rag智能召回": {
                    "Group": "对话",
                    "Color": "stop",
                    "AsButton": False,
                    "Info": "将问答数据记录到向量库中，作为长期参考。",
                    "Function": HotReload(Rag问答),
                },
            }
        )
    except:
        logger.error(trimmed_format_exc())
        logger.error("Load function plugin failed")
    # try:
    #     from crazy_functions.高级功能函数模板 import 测试图表渲染
    #     function_plugins.update({
--- a/crazy_functions/Latex_Function.py
+++ b/crazy_functions/Latex_Function.py
@@ -3,7 +3,7 @@ from toolbox import CatchException, report_exception, update_ui_lastest_msg, zip
 from functools import partial
 from loguru import logger
-import glob, os, requests, time, json, tarfile
+import glob, os, requests, time, json, tarfile, threading
 pj = os.path.join
 ARXIV_CACHE_DIR = get_conf("ARXIV_CACHE_DIR")
@@ -138,25 +138,43 @@ def arxiv_download(chatbot, history, txt, allow_cache=True):
    cached_translation_pdf = check_cached_translation_pdf(arxiv_id)
    if cached_translation_pdf and allow_cache: return cached_translation_pdf, arxiv_id
    url_tar = url_.replace('/abs/', '/e-print/')
    translation_dir = pj(ARXIV_CACHE_DIR, arxiv_id, 'e-print')
    extract_dst = pj(ARXIV_CACHE_DIR, arxiv_id, 'extract')
-    os.makedirs(translation_dir, exist_ok=True)
+    translation_dir = pj(ARXIV_CACHE_DIR, arxiv_id, 'e-print')
    # <-------------- download arxiv source file ------------->
    dst = pj(translation_dir, arxiv_id + '.tar')
-    if os.path.exists(dst):
+    os.makedirs(translation_dir, exist_ok=True)
-        yield from update_ui_lastest_msg("调用缓存", chatbot=chatbot, history=history)  # 刷新界面
+    # <-------------- download arxiv source file ------------->
    def fix_url_and_download():
        # for url_tar in [url_.replace('/abs/', '/e-print/'), url_.replace('/abs/', '/src/')]:
        for url_tar in [url_.replace('/abs/', '/src/'), url_.replace('/abs/', '/e-print/')]:
            proxies = get_conf('proxies')
            r = requests.get(url_tar, proxies=proxies)
            if r.status_code == 200:
                with open(dst, 'wb+') as f:
                    f.write(r.content)
                return True
        return False
    if os.path.exists(dst) and allow_cache:
        yield from update_ui_lastest_msg(f"调用缓存 {arxiv_id}", chatbot=chatbot, history=history)  # 刷新界面
        success = True
    else:
-        yield from update_ui_lastest_msg("开始下载", chatbot=chatbot, history=history)  # 刷新界面
+        yield from update_ui_lastest_msg(f"开始下载 {arxiv_id}", chatbot=chatbot, history=history)  # 刷新界面
-        proxies = get_conf('proxies')
+        success = fix_url_and_download()
-        r = requests.get(url_tar, proxies=proxies)
+        yield from update_ui_lastest_msg(f"下载完成 {arxiv_id}", chatbot=chatbot, history=history)  # 刷新界面
-        with open(dst, 'wb+') as f:
+
-            f.write(r.content)
+
    if not success:
        yield from update_ui_lastest_msg(f"下载失败 {arxiv_id}", chatbot=chatbot, history=history)
        raise tarfile.ReadError(f"论文下载失败 {arxiv_id}")
    # <-------------- extract file ------------->
    yield from update_ui_lastest_msg("下载完成", chatbot=chatbot, history=history)  # 刷新界面
    from toolbox import extract_archive
-    extract_archive(file_path=dst, dest_dir=extract_dst)
+    try:
        extract_archive(file_path=dst, dest_dir=extract_dst)
    except tarfile.ReadError:
        os.remove(dst)
        raise tarfile.ReadError(f"论文下载失败")
    return extract_dst, arxiv_id
@@ -320,11 +338,17 @@ def Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot,
    # <-------------- more requirements ------------->
    if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
    more_req = plugin_kwargs.get("advanced_arg", "")
-    no_cache = more_req.startswith("--no-cache")
+
-    if no_cache: more_req.lstrip("--no-cache")
+    no_cache = ("--no-cache" in more_req)
    if no_cache: more_req = more_req.replace("--no-cache", "").strip()
    allow_gptac_cloud_io = ("--allow-cloudio" in more_req)  # 从云端下载翻译结果，以及上传翻译结果到云端
    if allow_gptac_cloud_io: more_req = more_req.replace("--allow-cloudio", "").strip()
    allow_cache = not no_cache
    _switch_prompt_ = partial(switch_prompt, more_requirement=more_req)
    # <-------------- check deps ------------->
    try:
        import glob, os, time, subprocess
@@ -351,6 +375,20 @@ def Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot,
        yield from update_ui(chatbot=chatbot, history=history)  # 刷新界面
        return
    # #################################################################
    if allow_gptac_cloud_io and arxiv_id:
        # 访问 GPTAC学术云，查询云端是否存在该论文的翻译版本
        from crazy_functions.latex_fns.latex_actions import check_gptac_cloud
        success, downloaded = check_gptac_cloud(arxiv_id, chatbot)
        if success:
            chatbot.append([
                f"检测到GPTAC云端存在翻译版本, 如果不满意翻译结果, 请禁用云端分享, 然后重新执行。", 
                None
            ])
            yield from update_ui(chatbot=chatbot, history=history)
            return
    #################################################################
    if os.path.exists(txt):
        project_folder = txt
    else:
@@ -388,14 +426,21 @@ def Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot,
    # <-------------- zip PDF ------------->
    zip_res = zip_result(project_folder)
    if success:
        if allow_gptac_cloud_io and arxiv_id:
            # 如果用户允许，我们将翻译好的arxiv论文PDF上传到GPTAC学术云
            from crazy_functions.latex_fns.latex_actions import upload_to_gptac_cloud_if_user_allow
            threading.Thread(target=upload_to_gptac_cloud_if_user_allow, 
                args=(chatbot, arxiv_id), daemon=True).start()
        chatbot.append((f"成功啦", '请查收结果（压缩包）...'))
-        yield from update_ui(chatbot=chatbot, history=history);
+        yield from update_ui(chatbot=chatbot, history=history)
        time.sleep(1)  # 刷新界面
        promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
    else:
        chatbot.append((f"失败了",
                        '虽然PDF生成失败了, 但请查收结果（压缩包）, 内含已经翻译的Tex文档, 您可以到Github Issue区, 用该压缩包进行反馈。如系统是Linux，请检查系统字体（见Github wiki） ...'))
-        yield from update_ui(chatbot=chatbot, history=history);
+        yield from update_ui(chatbot=chatbot, history=history)
        time.sleep(1)  # 刷新界面
        promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
--- a/crazy_functions/Latex_Function_Wrap.py
+++ b/crazy_functions/Latex_Function_Wrap.py
@@ -30,6 +30,8 @@ class Arxiv_Localize(GptAcademicPluginTemplate):
                            default_value="", type="string").model_dump_json(), # 高级参数输入区，自动同步
            "allow_cache":
                ArgProperty(title="是否允许从缓存中调取结果", options=["允许缓存", "从头执行"], default_value="允许缓存", description="无", type="dropdown").model_dump_json(),
            "allow_cloudio":
                ArgProperty(title="是否允许从GPTAC学术云下载(或者上传)翻译结果(仅针对Arxiv论文)", options=["允许", "禁止"], default_value="禁止", description="共享文献，互助互利", type="dropdown").model_dump_json(),
        }
        return gui_definition
@@ -38,9 +40,14 @@ class Arxiv_Localize(GptAcademicPluginTemplate):
        执行插件
        """
        allow_cache = plugin_kwargs["allow_cache"]
        allow_cloudio = plugin_kwargs["allow_cloudio"]
        advanced_arg = plugin_kwargs["advanced_arg"]
        if allow_cache == "从头执行": plugin_kwargs["advanced_arg"] = "--no-cache " + plugin_kwargs["advanced_arg"]
        # 从云端下载翻译结果，以及上传翻译结果到云端；人人为我，我为人人。
        if allow_cloudio == "允许": plugin_kwargs["advanced_arg"] = "--allow-cloudio " + plugin_kwargs["advanced_arg"]
        yield from Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request)
--- a/crazy_functions/Markdown_Translate.py
+++ b/crazy_functions/Markdown_Translate.py
@@ -65,7 +65,7 @@ def 多文件翻译(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
            pfg.file_contents.append(file_content)
    #  <-------- 拆分过长的Markdown文件 ---------->
-    pfg.run_file_split(max_token_limit=2048)
+    pfg.run_file_split(max_token_limit=1024)
    n_split = len(pfg.sp_file_contents)
    #  <-------- 多线程翻译开始 ---------->
--- a/crazy_functions/Rag_Interface.py
+++ b/crazy_functions/Rag_Interface.py
@@ -2,20 +2,7 @@ from toolbox import CatchException, update_ui, get_conf, get_log_folder, update_
 from crazy_functions.crazy_utils import input_clipping
 from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
 VECTOR_STORE_TYPE = "Milvus"
 if VECTOR_STORE_TYPE == "Milvus":
    try:
        from crazy_functions.rag_fns.milvus_worker import MilvusRagWorker as LlamaIndexRagWorker
    except:
        VECTOR_STORE_TYPE = "Simple"
 if VECTOR_STORE_TYPE == "Simple":
    from crazy_functions.rag_fns.llama_index_worker import LlamaIndexRagWorker
 RAG_WORKER_REGISTER = {}
 MAX_HISTORY_ROUND = 5
 MAX_CONTEXT_TOKEN_LIMIT = 4096
 REMEMBER_PREVIEW = 1000
@@ -23,6 +10,16 @@ REMEMBER_PREVIEW = 1000
@CatchException
 def Rag问答(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
    # import vector store lib
    VECTOR_STORE_TYPE = "Milvus"
    if VECTOR_STORE_TYPE == "Milvus":
        try:
            from crazy_functions.rag_fns.milvus_worker import MilvusRagWorker as LlamaIndexRagWorker
        except:
            VECTOR_STORE_TYPE = "Simple"
    if VECTOR_STORE_TYPE == "Simple":
        from crazy_functions.rag_fns.llama_index_worker import LlamaIndexRagWorker
    # 1. we retrieve rag worker from global context
    user_name = chatbot.get_user()
    checkpoint_dir = get_log_folder(user_name, plugin_name='experimental_rag')
--- a/crazy_functions/Social_Helper.py
+++ b/crazy_functions/Social_Helper.py
@@ -1,7 +1,13 @@
 import pickle, os, random
 from toolbox import CatchException, update_ui, get_conf, get_log_folder, update_ui_lastest_msg
 from crazy_functions.crazy_utils import input_clipping
 from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
-import pickle, os
+from request_llms.bridge_all import predict_no_ui_long_connection
 from crazy_functions.json_fns.select_tool import structure_output, select_tool
 from pydantic import BaseModel, Field
 from loguru import logger
 from typing import List
 SOCIAL_NETWOK_WORKER_REGISTER = {}
@@ -9,7 +15,7 @@ class SocialNetwork():
    def __init__(self):
        self.people = []
-class SocialNetworkWorker():
+class SaveAndLoad():
    def __init__(self, user_name, llm_kwargs, auto_load_checkpoint=True, checkpoint_dir=None) -> None:
        self.user_name = user_name
        self.checkpoint_dir = checkpoint_dir
@@ -41,8 +47,105 @@ class SocialNetworkWorker():
            return SocialNetwork()
 class Friend(BaseModel):
    friend_name: str = Field(description="name of a friend")
    friend_description: str = Field(description="description of a friend (everything about this friend)")
    friend_relationship: str = Field(description="The relationship with a friend (e.g. friend, family, colleague)")
 class FriendList(BaseModel):
    friends_list: List[Friend] = Field(description="The list of friends")
 class SocialNetworkWorker(SaveAndLoad):
    def ai_socail_advice(self, prompt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, run_gpt_fn, intention_type):
        pass
    def ai_remove_friend(self, prompt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, run_gpt_fn, intention_type):
        pass
    def ai_list_friends(self, prompt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, run_gpt_fn, intention_type):
        pass
    def ai_add_multi_friends(self, prompt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, run_gpt_fn, intention_type):
        friend, err_msg = structure_output(
            txt=prompt,
            prompt="根据提示, 解析多个联系人的身份信息\n\n",
            err_msg=f"不能理解该联系人",
            run_gpt_fn=run_gpt_fn,
            pydantic_cls=FriendList
        )
        if friend.friends_list:
            for f in friend.friends_list: 
                self.add_friend(f)
            msg = f"成功添加{len(friend.friends_list)}个联系人: {str(friend.friends_list)}"
            yield from update_ui_lastest_msg(lastmsg=msg, chatbot=chatbot, history=history, delay=0)
    def run(self, txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
        prompt = txt
        run_gpt_fn = lambda inputs, sys_prompt: predict_no_ui_long_connection(inputs=inputs, llm_kwargs=llm_kwargs, history=[], sys_prompt=sys_prompt, observe_window=[])
        self.tools_to_select = {
            "SocialAdvice":{
                "explain_to_llm": "如果用户希望获取社交指导，调用SocialAdvice生成一些社交建议",
                "callback": self.ai_socail_advice,
            },
            "AddFriends":{
                "explain_to_llm": "如果用户给出了联系人，调用AddMultiFriends把联系人添加到数据库",
                "callback": self.ai_add_multi_friends,
            },
            "RemoveFriend":{
                "explain_to_llm": "如果用户希望移除某个联系人，调用RemoveFriend",
                "callback": self.ai_remove_friend,
            },
            "ListFriends":{
                "explain_to_llm": "如果用户列举联系人，调用ListFriends",
                "callback": self.ai_list_friends,
            }
        }
        try:
            Explaination = '\n'.join([f'{k}: {v["explain_to_llm"]}' for k, v in self.tools_to_select.items()])
            class UserSociaIntention(BaseModel):
                intention_type: str = Field(
                    description=
                        f"The type of user intention. You must choose from {self.tools_to_select.keys()}.\n\n" 
                        f"Explaination:\n{Explaination}", 
                    default="SocialAdvice"
                )
            pydantic_cls_instance, err_msg = select_tool(
                prompt=txt,
                run_gpt_fn=run_gpt_fn,
                pydantic_cls=UserSociaIntention
            )
        except Exception as e:
            yield from update_ui_lastest_msg(
                lastmsg=f"无法理解用户意图 {err_msg}", 
                chatbot=chatbot, 
                history=history, 
                delay=0
            )
            return
        intention_type = pydantic_cls_instance.intention_type
        intention_callback = self.tools_to_select[pydantic_cls_instance.intention_type]['callback']
        yield from intention_callback(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, run_gpt_fn, intention_type)
    def add_friend(self, friend):
        # check whether the friend is already in the social network
        for f in self.social_network.people:
            if f.friend_name == friend.friend_name:
                f.friend_description = friend.friend_description
                f.friend_relationship = friend.friend_relationship
                logger.info(f"Repeated friend, update info: {friend}")
                return
        logger.info(f"Add a new friend: {friend}")
        self.social_network.people.append(friend)
        return
@CatchException
-def I人助手(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request, num_day=5):
+def I人助手(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
    # 1. we retrieve worker from global context
    user_name = chatbot.get_user()
@@ -58,8 +161,7 @@ def I人助手(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt,
        )
    # 2. save
-    social_network_worker.social_network.people.append("张三")
+    yield from social_network_worker.run(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request)
    social_network_worker.save_to_checkpoint(checkpoint_dir)
    chatbot.append(["good", "work"])
    yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
--- a/crazy_functions/SourceCode_Comment.py
+++ b/crazy_functions/SourceCode_Comment.py
@@ -6,7 +6,10 @@ from crazy_functions.crazy_utils import request_gpt_model_multi_threads_with_ver
 from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
 from crazy_functions.agent_fns.python_comment_agent import PythonCodeComment
 from crazy_functions.diagram_fns.file_tree import FileNode
 from crazy_functions.agent_fns.watchdog import WatchDog
 from shared_utils.advanced_markdown_format import markdown_convertion_for_file
 from loguru import logger
 def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
@@ -24,12 +27,13 @@ def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
        file_tree_struct.add_file(file_path, file_path)
    # <第一步，逐个文件分析，多线程>
    lang = "" if not plugin_kwargs["use_chinese"] else " (you must use Chinese)"
    for index, fp in enumerate(file_manifest):
        # 读取文件
        with open(fp, 'r', encoding='utf-8', errors='replace') as f:
            file_content = f.read()
        prefix = ""
-        i_say = prefix + f'Please conclude the following source code at {os.path.relpath(fp, project_folder)} with only one sentence, the code is:\n```{file_content}```'
+        i_say = prefix + f'Please conclude the following source code at {os.path.relpath(fp, project_folder)} with only one sentence{lang}, the code is:\n```{file_content}```'
        i_say_show_user = prefix + f'[{index+1}/{len(file_manifest)}] 请用一句话对下面的程序文件做一个整体概述: {fp}'
        # 装载请求内容
        MAX_TOKEN_SINGLE_FILE = 2560
@@ -37,7 +41,7 @@ def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
        inputs_array.append(i_say)
        inputs_show_user_array.append(i_say_show_user)
        history_array.append([])
-        sys_prompt_array.append("You are a software architecture analyst analyzing a source code project. Do not dig into details, tell me what the code is doing in general. Your answer must be short, simple and clear.")
+        sys_prompt_array.append(f"You are a software architecture analyst analyzing a source code project. Do not dig into details, tell me what the code is doing in general. Your answer must be short, simple and clear{lang}.")
    # 文件读取完成，对每一个源代码文件，生成一个请求线程，发送到大模型进行分析
    gpt_response_collection = yield from request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
        inputs_array = inputs_array,
@@ -50,10 +54,20 @@ def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
    )
    # <第二步，逐个文件分析，生成带注释文件>
    tasks = ["" for _ in range(len(file_manifest))]
    def bark_fn(tasks):
        for i in range(len(tasks)): tasks[i] = "watchdog is dead"
    wd = WatchDog(timeout=10, bark_fn=lambda: bark_fn(tasks), interval=3, msg="ThreadWatcher timeout")
    wd.begin_watch()
    from concurrent.futures import ThreadPoolExecutor
    executor = ThreadPoolExecutor(max_workers=get_conf('DEFAULT_WORKER_NUM'))
-    def _task_multi_threading(i_say, gpt_say, fp, file_tree_struct):
+    def _task_multi_threading(i_say, gpt_say, fp, file_tree_struct, index):
-        pcc = PythonCodeComment(llm_kwargs, language='English')
+        language = 'Chinese' if plugin_kwargs["use_chinese"] else 'English'
        def observe_window_update(x):
            if tasks[index] == "watchdog is dead":
                raise TimeoutError("ThreadWatcher: watchdog is dead")
            tasks[index] = x
        pcc = PythonCodeComment(llm_kwargs, plugin_kwargs, language=language, observe_window_update=observe_window_update)
        pcc.read_file(path=fp, brief=gpt_say)
        revised_path, revised_content = pcc.begin_comment_source_code(None, None)
        file_tree_struct.manifest[fp].revised_path = revised_path
@@ -65,7 +79,8 @@ def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
        with open("crazy_functions/agent_fns/python_comment_compare.html", 'r', encoding='utf-8') as f:
            html_template = f.read()
        warp = lambda x: "```python\n\n" + x + "\n\n```"
-        from themes.theme import advanced_css
+        from themes.theme import load_dynamic_theme
        _, advanced_css, _, _ = load_dynamic_theme("Default")
        html_template = html_template.replace("ADVANCED_CSS", advanced_css)
        html_template = html_template.replace("REPLACE_CODE_FILE_LEFT", pcc.get_markdown_block_in_html(markdown_convertion_for_file(warp(pcc.original_content))))
        html_template = html_template.replace("REPLACE_CODE_FILE_RIGHT", pcc.get_markdown_block_in_html(markdown_convertion_for_file(warp(revised_content))))
@@ -73,17 +88,21 @@ def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
        file_tree_struct.manifest[fp].compare_html = compare_html_path
        with open(compare_html_path, 'w', encoding='utf-8') as f:
            f.write(html_template)
-        # print('done 1')
+        tasks[index] = ""
    chatbot.append([None, f"正在处理:"])
    futures = []
    index = 0
    for i_say, gpt_say, fp in zip(gpt_response_collection[0::2], gpt_response_collection[1::2], file_manifest):
-        future = executor.submit(_task_multi_threading, i_say, gpt_say, fp, file_tree_struct)
+        future = executor.submit(_task_multi_threading, i_say, gpt_say, fp, file_tree_struct, index)
        index += 1
        futures.append(future)
    # <第三步，等待任务完成>
    cnt = 0
    while True:
        cnt += 1
        wd.feed()
        time.sleep(3)
        worker_done = [h.done() for h in futures]
        remain = len(worker_done) - sum(worker_done)
@@ -92,14 +111,18 @@ def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
        preview_html_list = []
        for done, fp in zip(worker_done, file_manifest):
            if not done: continue
-            preview_html_list.append(file_tree_struct.manifest[fp].compare_html)
+            if hasattr(file_tree_struct.manifest[fp], 'compare_html'):
                preview_html_list.append(file_tree_struct.manifest[fp].compare_html)
            else:
                logger.error(f"文件: {fp} 的注释结果未能成功")
        file_links = generate_file_link(preview_html_list)
        yield from update_ui_lastest_msg(
-            f"剩余源文件数量: {remain}.\n\n" + 
+            f"当前任务: <br/>{'<br/>'.join(tasks)}.<br/>" + 
-            f"已完成的文件: {sum(worker_done)}.\n\n" + 
+            f"剩余源文件数量: {remain}.<br/>" + 
            f"已完成的文件: {sum(worker_done)}.<br/>" + 
            file_links +
-            "\n\n" +
+            "<br/>" +
            ''.join(['.']*(cnt % 10 + 1)
        ), chatbot=chatbot, history=history, delay=0)
        yield from update_ui(chatbot=chatbot, history=[]) # 刷新界面
@@ -120,6 +143,7 @@ def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
@CatchException
 def 注释Python项目(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
    history = []    # 清空历史，以免输入溢出
    plugin_kwargs["use_chinese"] = plugin_kwargs.get("use_chinese", False)
    import glob, os
    if os.path.exists(txt):
        project_folder = txt
--- a/crazy_functions/SourceCode_Comment_Wrap.py
+++ b/crazy_functions/SourceCode_Comment_Wrap.py
@@ -0,0 +1,36 @@
 from toolbox import get_conf, update_ui
 from crazy_functions.plugin_template.plugin_class_template import GptAcademicPluginTemplate, ArgProperty
 from crazy_functions.SourceCode_Comment import 注释Python项目
 class SourceCodeComment_Wrap(GptAcademicPluginTemplate):
    def __init__(self):
        """
        请注意`execute`会执行在不同的线程中，因此您在定义和使用类变量时，应当慎之又慎！
        """
        pass
    def define_arg_selection_menu(self):
        """
        定义插件的二级选项菜单
        """
        gui_definition = {
            "main_input":
                ArgProperty(title="路径", description="程序路径（上传文件后自动填写）", default_value="", type="string").model_dump_json(), # 主输入，自动从输入框同步
            "use_chinese":
                ArgProperty(title="注释语言", options=["英文", "中文"], default_value="英文", description="无", type="dropdown").model_dump_json(),
            # "use_emoji":
                # ArgProperty(title="在注释中使用emoji", options=["禁止", "允许"], default_value="禁止", description="无", type="dropdown").model_dump_json(),
        }
        return gui_definition
    def execute(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
        """
        执行插件
        """
        if plugin_kwargs["use_chinese"] == "中文": 
            plugin_kwargs["use_chinese"] = True
        else: 
            plugin_kwargs["use_chinese"] = False
        yield from 注释Python项目(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request)
--- a/crazy_functions/agent_fns/python_comment_agent.py
+++ b/crazy_functions/agent_fns/python_comment_agent.py
@@ -68,6 +68,7 @@ Be aware:
 1. You must NOT modify the indent of code.
 2. You are NOT authorized to change or translate non-comment code, and you are NOT authorized to add empty lines either, toggle qu.
 3. Use {LANG} to add comments and docstrings. Do NOT translate Chinese that is already in the code.
 4. Besides adding a docstring, use the ⭐ symbol to annotate the most core and important line of code within the function, explaining its role.
 ------------------ Example ------------------
 INPUT:
@@ -116,10 +117,66 @@ def zip_result(folder):
 '''
 revise_funtion_prompt_chinese = '''
 您需要阅读以下代码，并根据以下说明修订源代码({FILE_BASENAME}):
 1. 如果源代码中包含函数的话, 你应该分析给定函数实现了什么功能
 2. 如果源代码中包含函数的话, 你需要为函数添加docstring, docstring必须使用中文
 请注意：
 1. 你不得修改代码的缩进
 2. 你无权更改或翻译代码中的非注释部分，也不允许添加空行
 3. 使用 {LANG} 添加注释和文档字符串。不要翻译代码中已有的中文
 4. 除了添加docstring之外, 使用⭐符号给该函数中最核心、最重要的一行代码添加注释，并说明其作用
 ------------------ 示例 ------------------
 INPUT:
 ```
 L0000 |
 L0001 |def zip_result(folder):
 L0002 |    t = gen_time_str()
 L0003 |    zip_folder(folder, get_log_folder(), f"result.zip")
 L0004 |    return os.path.join(get_log_folder(), f"result.zip")
 L0005 |
 L0006 |
 ```
 OUTPUT:
 <instruction_1_purpose>
 该函数用于压缩指定文件夹，并返回生成的`zip`文件的路径。
 </instruction_1_purpose>
 <instruction_2_revised_code>
 ```
 def zip_result(folder):
    """
    该函数将指定的文件夹压缩成ZIP文件, 并将其存储在日志文件夹中。
    输入参数:
        folder (str): 需要压缩的文件夹的路径。
    返回值:
        str: 日志文件夹中创建的ZIP文件的路径。
    """
    t = gen_time_str()
    zip_folder(folder, get_log_folder(), f"result.zip")  # ⭐ 执行文件夹的压缩
    return os.path.join(get_log_folder(), f"result.zip")
 ```
 </instruction_2_revised_code>
 ------------------ End of Example ------------------
 ------------------ the real INPUT you need to process NOW ({FILE_BASENAME}) ------------------
 ```
 {THE_CODE}
 ```
 {INDENT_REMINDER}
 {BRIEF_REMINDER}
 {HINT_REMINDER}
 '''
 class PythonCodeComment():
-    def __init__(self, llm_kwargs, language) -> None:
+    def __init__(self, llm_kwargs, plugin_kwargs, language, observe_window_update) -> None:
        self.original_content = ""
        self.full_context = []
        self.full_context_with_line_no = []
@@ -127,7 +184,13 @@ class PythonCodeComment():
        self.page_limit = 100 # 100 lines of code each page
        self.ignore_limit = 20
        self.llm_kwargs = llm_kwargs
        self.plugin_kwargs = plugin_kwargs
        self.language = language
        self.observe_window_update = observe_window_update
        if self.language == "chinese":
            self.core_prompt = revise_funtion_prompt_chinese
        else:
            self.core_prompt = revise_funtion_prompt
        self.path = None
        self.file_basename = None
        self.file_brief = ""
@@ -258,7 +321,7 @@ class PythonCodeComment():
        hint_reminder = "" if hint is None else f"(Reminder: do not ignore or modify code such as `{hint}`, provide complete code in the OUTPUT.)"
        self.llm_kwargs['temperature'] = 0
        result = predict_no_ui_long_connection(
-            inputs=revise_funtion_prompt.format(
+            inputs=self.core_prompt.format(
                LANG=self.language, 
                FILE_BASENAME=self.file_basename, 
                THE_CODE=code, 
@@ -348,6 +411,7 @@ class PythonCodeComment():
            try:
                # yield from update_ui_lastest_msg(f"({self.file_basename}) 正在读取下一段代码片段:\n", chatbot=chatbot, history=history, delay=0)
                next_batch, line_no_start, line_no_end = self.get_next_batch()
                self.observe_window_update(f"正在处理{self.file_basename} - {line_no_start}/{len(self.full_context)}\n")
                # yield from update_ui_lastest_msg(f"({self.file_basename}) 处理代码片段:\n\n{next_batch}", chatbot=chatbot, history=history, delay=0)
                hint = None
--- a/crazy_functions/ast_fns/comment_remove.py
+++ b/crazy_functions/ast_fns/comment_remove.py
@@ -1,39 +1,47 @@
-import ast
+import token
 import tokenize
 import copy
 import io
 class CommentRemover(ast.NodeTransformer):
    def visit_FunctionDef(self, node):
        # 移除函数的文档字符串
        if (node.body and isinstance(node.body[0], ast.Expr) and
                isinstance(node.body[0].value, ast.Str)):
            node.body = node.body[1:]
        self.generic_visit(node)
        return node
-    def visit_ClassDef(self, node):
+def remove_python_comments(input_source: str) -> str:
-        # 移除类的文档字符串
+    source_flag = copy.copy(input_source)
-        if (node.body and isinstance(node.body[0], ast.Expr) and
+    source = io.StringIO(input_source)
-                isinstance(node.body[0].value, ast.Str)):
+    ls = input_source.split('\n')
-            node.body = node.body[1:]
+    prev_toktype = token.INDENT
-        self.generic_visit(node)
+    readline = source.readline
        return node
-    def visit_Module(self, node):
+    def get_char_index(lineno, col):
-        # 移除模块的文档字符串
+        # find the index of the char in the source code
-        if (node.body and isinstance(node.body[0], ast.Expr) and
+        if lineno == 1:
-                isinstance(node.body[0].value, ast.Str)):
+            return len('\n'.join(ls[:(lineno-1)])) + col
-            node.body = node.body[1:]
+        else:
-        self.generic_visit(node)
+            return len('\n'.join(ls[:(lineno-1)])) + col + 1
-        return node
+
-    
+    def replace_char_between(start_lineno, start_col, end_lineno, end_col, source, replace_char, ls):
        # replace char between start_lineno, start_col and end_lineno, end_col with replace_char, but keep '\n' and ' '
        b = get_char_index(start_lineno, start_col)
        e = get_char_index(end_lineno, end_col)
        for i in range(b, e):
            if source[i] == '\n':
                source = source[:i] + '\n' + source[i+1:]
            elif source[i] == ' ':
                source = source[:i] + ' ' + source[i+1:]
            else:
                source = source[:i] + replace_char + source[i+1:]
        return source
    tokgen = tokenize.generate_tokens(readline)
    for toktype, ttext, (slineno, scol), (elineno, ecol), ltext in tokgen:
        if toktype == token.STRING and (prev_toktype == token.INDENT):
            source_flag = replace_char_between(slineno, scol, elineno, ecol, source_flag, ' ', ls)
        elif toktype == token.STRING and (prev_toktype == token.NEWLINE):
            source_flag = replace_char_between(slineno, scol, elineno, ecol, source_flag, ' ', ls)
        elif toktype == tokenize.COMMENT:
            source_flag = replace_char_between(slineno, scol, elineno, ecol, source_flag, ' ', ls)
        prev_toktype = toktype
    return source_flag
 def remove_python_comments(source_code):
    # 解析源代码为 AST
    tree = ast.parse(source_code)
    # 移除注释
    transformer = CommentRemover()
    tree = transformer.visit(tree)
    # 将处理后的 AST 转换回源代码
    return ast.unparse(tree)
 # 示例使用
 if __name__ == "__main__":
--- a/crazy_functions/doc_fns/batch_file_query_doc.py
+++ b/crazy_functions/doc_fns/batch_file_query_doc.py
@@ -0,0 +1,450 @@
 import os
 import time
 from abc import ABC, abstractmethod
 from datetime import datetime
 from docx import Document
 from docx.enum.style import WD_STYLE_TYPE
 from docx.enum.text import WD_PARAGRAPH_ALIGNMENT, WD_LINE_SPACING
 from docx.oxml.ns import qn
 from docx.shared import  Inches, Cm
 from docx.shared import Pt, RGBColor, Inches
 from typing import Dict, List, Tuple
 class DocumentFormatter(ABC):
    """文档格式化基类，定义文档格式化的基本接口"""
    def __init__(self, final_summary: str, file_summaries_map: Dict, failed_files: List[Tuple]):
        self.final_summary = final_summary
        self.file_summaries_map = file_summaries_map
        self.failed_files = failed_files
    @abstractmethod
    def format_failed_files(self) -> str:
        """格式化失败文件列表"""
        pass
    @abstractmethod
    def format_file_summaries(self) -> str:
        """格式化文件总结内容"""
        pass
    @abstractmethod
    def create_document(self) -> str:
        """创建完整文档"""
        pass
 class WordFormatter(DocumentFormatter):
    """Word格式文档生成器 - 符合中国政府公文格式规范(GB/T 9704-2012)，并进行了优化"""
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.doc = Document()
        self._setup_document()
        self._create_styles()
        # 初始化三级标题编号系统
        self.numbers = {
            1: 0,  # 一级标题编号
            2: 0,  # 二级标题编号
            3: 0  # 三级标题编号
        }
    def _setup_document(self):
        """设置文档基本格式，包括页面设置和页眉"""
        sections = self.doc.sections
        for section in sections:
            # 设置页面大小为A4
            section.page_width = Cm(21)
            section.page_height = Cm(29.7)
            # 设置页边距
            section.top_margin = Cm(3.7)  # 上边距37mm
            section.bottom_margin = Cm(3.5)  # 下边距35mm
            section.left_margin = Cm(2.8)  # 左边距28mm
            section.right_margin = Cm(2.6)  # 右边距26mm
            # 设置页眉页脚距离
            section.header_distance = Cm(2.0)
            section.footer_distance = Cm(2.0)
            # 添加页眉
            header = section.header
            header_para = header.paragraphs[0]
            header_para.alignment = WD_PARAGRAPH_ALIGNMENT.RIGHT
            header_run = header_para.add_run("该文档由GPT-academic生成")
            header_run.font.name = '仿宋'
            header_run._element.rPr.rFonts.set(qn('w:eastAsia'), '仿宋')
            header_run.font.size = Pt(9)
    def _create_styles(self):
        """创建文档样式"""
        # 创建正文样式
        style = self.doc.styles.add_style('Normal_Custom', WD_STYLE_TYPE.PARAGRAPH)
        style.font.name = '仿宋'
        style._element.rPr.rFonts.set(qn('w:eastAsia'), '仿宋')
        style.font.size = Pt(14)
        style.paragraph_format.line_spacing_rule = WD_LINE_SPACING.ONE_POINT_FIVE
        style.paragraph_format.space_after = Pt(0)
        style.paragraph_format.first_line_indent = Pt(28)
        # 创建各级标题样式
        self._create_heading_style('Title_Custom', '方正小标宋简体', 32, WD_PARAGRAPH_ALIGNMENT.CENTER)
        self._create_heading_style('Heading1_Custom', '黑体', 22, WD_PARAGRAPH_ALIGNMENT.LEFT)
        self._create_heading_style('Heading2_Custom', '黑体', 18, WD_PARAGRAPH_ALIGNMENT.LEFT)
        self._create_heading_style('Heading3_Custom', '黑体', 16, WD_PARAGRAPH_ALIGNMENT.LEFT)
    def _create_heading_style(self, style_name: str, font_name: str, font_size: int, alignment):
        """创建标题样式"""
        style = self.doc.styles.add_style(style_name, WD_STYLE_TYPE.PARAGRAPH)
        style.font.name = font_name
        style._element.rPr.rFonts.set(qn('w:eastAsia'), font_name)
        style.font.size = Pt(font_size)
        style.font.bold = True
        style.paragraph_format.alignment = alignment
        style.paragraph_format.space_before = Pt(12)
        style.paragraph_format.space_after = Pt(12)
        style.paragraph_format.line_spacing_rule = WD_LINE_SPACING.ONE_POINT_FIVE
        return style
    def _get_heading_number(self, level: int) -> str:
        """
        生成标题编号
        Args:
            level: 标题级别 (0-3)
        Returns:
            str: 格式化的标题编号
        """
        if level == 0:  # 主标题不需要编号
            return ""
        self.numbers[level] += 1  # 增加当前级别的编号
        # 重置下级标题编号
        for i in range(level + 1, 4):
            self.numbers[i] = 0
        # 根据级别返回不同格式的编号
        if level == 1:
            return f"{self.numbers[1]}. "
        elif level == 2:
            return f"{self.numbers[1]}.{self.numbers[2]} "
        elif level == 3:
            return f"{self.numbers[1]}.{self.numbers[2]}.{self.numbers[3]} "
        return ""
    def _add_heading(self, text: str, level: int):
        """
        添加带编号的标题
        Args:
            text: 标题文本
            level: 标题级别 (0-3)
        """
        style_map = {
            0: 'Title_Custom',
            1: 'Heading1_Custom',
            2: 'Heading2_Custom',
            3: 'Heading3_Custom'
        }
        number = self._get_heading_number(level)
        paragraph = self.doc.add_paragraph(style=style_map[level])
        if number:
            number_run = paragraph.add_run(number)
            font_size = 22 if level == 1 else (18 if level == 2 else 16)
            self._get_run_style(number_run, '黑体', font_size, True)
        text_run = paragraph.add_run(text)
        font_size = 32 if level == 0 else (22 if level == 1 else (18 if level == 2 else 16))
        self._get_run_style(text_run, '黑体', font_size, True)
        # 主标题添加日期
        if level == 0:
            date_paragraph = self.doc.add_paragraph()
            date_paragraph.alignment = WD_PARAGRAPH_ALIGNMENT.CENTER
            date_run = date_paragraph.add_run(datetime.now().strftime('%Y年%m月%d日'))
            self._get_run_style(date_run, '仿宋', 16, False)
        return paragraph
    def _get_run_style(self, run, font_name: str, font_size: int, bold: bool = False):
        """设置文本运行对象的样式"""
        run.font.name = font_name
        run._element.rPr.rFonts.set(qn('w:eastAsia'), font_name)
        run.font.size = Pt(font_size)
        run.font.bold = bold
    def format_failed_files(self) -> str:
        """格式化失败文件列表"""
        result = []
        if not self.failed_files:
            return "\n".join(result)
        result.append("处理失败文件:")
        for fp, reason in self.failed_files:
            result.append(f"• {os.path.basename(fp)}: {reason}")
        self._add_heading("处理失败文件", 1)
        for fp, reason in self.failed_files:
            self._add_content(f"• {os.path.basename(fp)}: {reason}", indent=False)
        self.doc.add_paragraph()
        return "\n".join(result)
    def _add_content(self, text: str, indent: bool = True):
        """添加正文内容"""
        paragraph = self.doc.add_paragraph(text, style='Normal_Custom')
        if not indent:
            paragraph.paragraph_format.first_line_indent = Pt(0)
        return paragraph
    def format_file_summaries(self) -> str:
        """
        格式化文件总结内容，确保正确的标题层级
        返回:
            str: 格式化后的文件总结字符串
        标题层级规则:
        1. 一级标题为"各文件详细总结"
        2. 如果文件有目录路径:
           - 目录路径作为二级标题 (2.1, 2.2 等)
           - 该目录下所有文件作为三级标题 (2.1.1, 2.1.2 等)
        3. 如果文件没有目录路径:
           - 文件直接作为二级标题 (2.1, 2.2 等)
        """
        result = []
        # 首先对文件路径进行分组整理
        file_groups = {}
        for path in sorted(self.file_summaries_map.keys()):
            dir_path = os.path.dirname(path)
            if dir_path not in file_groups:
                file_groups[dir_path] = []
            file_groups[dir_path].append(path)
        # 处理没有目录的文件
        root_files = file_groups.get("", [])
        if root_files:
            for path in sorted(root_files):
                file_name = os.path.basename(path)
                result.append(f"\n📄 {file_name}")
                result.append(self.file_summaries_map[path])
                # 无目录的文件作为二级标题
                self._add_heading(f"📄 {file_name}", 2)
                self._add_content(self.file_summaries_map[path])
                self.doc.add_paragraph()
        # 处理有目录的文件
        for dir_path in sorted(file_groups.keys()):
            if dir_path == "":  # 跳过已处理的根目录文件
                continue
            # 添加目录作为二级标题
            result.append(f"\n📁 {dir_path}")
            self._add_heading(f"📁 {dir_path}", 2)
            # 该目录下的所有文件作为三级标题
            for path in sorted(file_groups[dir_path]):
                file_name = os.path.basename(path)
                result.append(f"\n📄 {file_name}")
                result.append(self.file_summaries_map[path])
                # 添加文件名作为三级标题
                self._add_heading(f"📄 {file_name}", 3)
                self._add_content(self.file_summaries_map[path])
                self.doc.add_paragraph()
        return "\n".join(result)
    def create_document(self):
        """创建完整Word文档并返回文档对象"""
        # 重置所有编号
        for level in self.numbers:
            self.numbers[level] = 0
        # 添加主标题
        self._add_heading("文档总结报告", 0)
        self.doc.add_paragraph()
        # 添加总体摘要
        self._add_heading("总体摘要", 1)
        self._add_content(self.final_summary)
        self.doc.add_paragraph()
        # 添加失败文件列表（如果有）
        if self.failed_files:
            self.format_failed_files()
        # 添加文件详细总结
        self._add_heading("各文件详细总结", 1)
        self.format_file_summaries()
        return self.doc
 class MarkdownFormatter(DocumentFormatter):
    """Markdown格式文档生成器"""
    def format_failed_files(self) -> str:
        if not self.failed_files:
            return ""
        formatted_text = ["\n## ⚠️ 处理失败的文件"]
        for fp, reason in self.failed_files:
            formatted_text.append(f"- {os.path.basename(fp)}: {reason}")
        formatted_text.append("\n---")
        return "\n".join(formatted_text)
    def format_file_summaries(self) -> str:
        formatted_text = []
        sorted_paths = sorted(self.file_summaries_map.keys())
        current_dir = ""
        for path in sorted_paths:
            dir_path = os.path.dirname(path)
            if dir_path != current_dir:
                if dir_path:
                    formatted_text.append(f"\n## 📁 {dir_path}")
                current_dir = dir_path
            file_name = os.path.basename(path)
            formatted_text.append(f"\n### 📄 {file_name}")
            formatted_text.append(self.file_summaries_map[path])
            formatted_text.append("\n---")
        return "\n".join(formatted_text)
    def create_document(self) -> str:
        document = [
            "# 📑 文档总结报告",
            "\n## 总体摘要",
            self.final_summary
        ]
        if self.failed_files:
            document.append(self.format_failed_files())
        document.extend([
            "\n# 📚 各文件详细总结",
            self.format_file_summaries()
        ])
        return "\n".join(document)
 class HtmlFormatter(DocumentFormatter):
    """HTML格式文档生成器"""
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.css_styles = """
        body {
            font-family: "Microsoft YaHei", Arial, sans-serif;
            line-height: 1.6;
            max-width: 1000px;
            margin: 0 auto;
            padding: 20px;
            color: #333;
        }
        h1 {
            color: #2c3e50;
            border-bottom: 2px solid #eee;
            padding-bottom: 10px;
            font-size: 24px;
            text-align: center;
        }
        h2 {
            color: #34495e;
            margin-top: 30px;
            font-size: 20px;
            border-left: 4px solid #3498db;
            padding-left: 10px;
        }
        h3 {
            color: #2c3e50;
            font-size: 18px;
            margin-top: 20px;
        }
        .summary {
            background-color: #f8f9fa;
            padding: 20px;
            border-radius: 5px;
            margin: 20px 0;
            box-shadow: 0 2px 4px rgba(0,0,0,0.1);
        }
        .details {
            margin-top: 40px;
        }
        .failed-files {
            background-color: #fff3f3;
            padding: 15px;
            border-left: 4px solid #e74c3c;
            margin: 20px 0;
        }
        .file-summary {
            background-color: #fff;
            padding: 15px;
            margin: 15px 0;
            border-radius: 4px;
            box-shadow: 0 1px 3px rgba(0,0,0,0.1);
        }
        """
    def format_failed_files(self) -> str:
        if not self.failed_files:
            return ""
        failed_files_html = ['<div class="failed-files">']
        failed_files_html.append("<h2>⚠️ 处理失败的文件</h2>")
        failed_files_html.append("<ul>")
        for fp, reason in self.failed_files:
            failed_files_html.append(f"<li><strong>{os.path.basename(fp)}:</strong> {reason}</li>")
        failed_files_html.append("</ul></div>")
        return "\n".join(failed_files_html)
    def format_file_summaries(self) -> str:
        formatted_html = []
        sorted_paths = sorted(self.file_summaries_map.keys())
        current_dir = ""
        for path in sorted_paths:
            dir_path = os.path.dirname(path)
            if dir_path != current_dir:
                if dir_path:
                    formatted_html.append(f'<h2>📁 {dir_path}</h2>')
                current_dir = dir_path
            file_name = os.path.basename(path)
            formatted_html.append('<div class="file-summary">')
            formatted_html.append(f'<h3>📄 {file_name}</h3>')
            formatted_html.append(f'<p>{self.file_summaries_map[path]}</p>')
            formatted_html.append('</div>')
        return "\n".join(formatted_html)
    def create_document(self) -> str:
        return f"""
        <!DOCTYPE html>
        <html>
        <head>
            <meta charset='utf-8'>
            <title>文档总结报告</title>
            <style>{self.css_styles}</style>
        </head>
        <body>
            <h1>📑 文档总结报告</h1>
            <h2>总体摘要</h2>
            <div class="summary">{self.final_summary}</div>
            {self.format_failed_files()}
            <div class="details">
                <h2>📚 各文件详细总结</h2>
                {self.format_file_summaries()}
            </div>
        </body>
        </html>
        """
--- a/crazy_functions/json_fns/select_tool.py
+++ b/crazy_functions/json_fns/select_tool.py
@@ -0,0 +1,26 @@
 from crazy_functions.json_fns.pydantic_io import GptJsonIO, JsonStringError
 def structure_output(txt, prompt, err_msg, run_gpt_fn, pydantic_cls):
    gpt_json_io = GptJsonIO(pydantic_cls)
    analyze_res = run_gpt_fn(
        txt, 
        sys_prompt=prompt + gpt_json_io.format_instructions
    )
    try:
        friend = gpt_json_io.generate_output_auto_repair(analyze_res, run_gpt_fn)
    except JsonStringError as e:
        return None, err_msg
    err_msg = ""
    return friend, err_msg
 def select_tool(prompt, run_gpt_fn, pydantic_cls):
    pydantic_cls_instance, err_msg = structure_output(
        txt=prompt,
        prompt="根据提示, 分析应该调用哪个工具函数\n\n",
        err_msg=f"不能理解该联系人",
        run_gpt_fn=run_gpt_fn,
        pydantic_cls=pydantic_cls
    )
    return pydantic_cls_instance, err_msg
--- a/crazy_functions/latex_fns/latex_actions.py
+++ b/crazy_functions/latex_fns/latex_actions.py
@@ -3,7 +3,7 @@ import re
 import shutil
 import numpy as np
 from loguru import logger
-from toolbox import update_ui, update_ui_lastest_msg, get_log_folder
+from toolbox import update_ui, update_ui_lastest_msg, get_log_folder, gen_time_str
 from toolbox import get_conf, promote_file_to_downloadzone
 from crazy_functions.latex_fns.latex_toolbox import PRESERVE, TRANSFORM
 from crazy_functions.latex_fns.latex_toolbox import set_forbidden_text, set_forbidden_text_begin_end, set_forbidden_text_careful_brace
@@ -468,3 +468,70 @@ def write_html(sp_file_contents, sp_file_result, chatbot, project_folder):
    except:
        from toolbox import trimmed_format_exc
        logger.error('writing html result failed:', trimmed_format_exc())
 def upload_to_gptac_cloud_if_user_allow(chatbot, arxiv_id):
    try:
        # 如果用户允许，我们将arxiv论文PDF上传到GPTAC学术云
        from toolbox import map_file_to_sha256
        # 检查是否顺利，如果没有生成预期的文件，则跳过
        is_result_good = False
        for file_path in chatbot._cookies.get("files_to_promote", []):
            if file_path.endswith('translate_zh.pdf'):
                is_result_good = True
        if not is_result_good:
            return
        # 上传文件
        for file_path in chatbot._cookies.get("files_to_promote", []):
            align_name = None
            # normalized name
            for name in ['translate_zh.pdf', 'comparison.pdf']:
                if file_path.endswith(name): align_name = name
            # if match any align name
            if align_name:
                logger.info(f'Uploading to GPTAC cloud as the user has set `allow_cloud_io`: {file_path}')
                with open(file_path, 'rb') as f:
                    import requests
                    url = 'https://cloud-2.agent-matrix.com/arxiv_tf_paper_normal_upload'
                    files = {'file': (align_name, f, 'application/octet-stream')}
                    data = {
                        'arxiv_id': arxiv_id,
                        'file_hash': map_file_to_sha256(file_path),
                        'language': 'zh',
                        'trans_prompt': 'to_be_implemented',
                        'llm_model': 'to_be_implemented',
                        'llm_model_param': 'to_be_implemented',
                    }
                    resp = requests.post(url=url, files=files, data=data, timeout=30)
                logger.info(f'Uploading terminate ({resp.status_code})`: {file_path}')
    except:
        # 如果上传失败，不会中断程序，因为这是次要功能
        pass
 def check_gptac_cloud(arxiv_id, chatbot):
    import requests
    success = False
    downloaded = []
    try:
        for pdf_target in ['translate_zh.pdf', 'comparison.pdf']:
            url = 'https://cloud-2.agent-matrix.com/arxiv_tf_paper_normal_exist'
            data = {
                'arxiv_id': arxiv_id,
                'name': pdf_target,
            }
            resp = requests.post(url=url, data=data)
            cache_hit_result = resp.text.strip('"')
            if cache_hit_result.startswith("http"):
                url = cache_hit_result
                logger.info(f'Downloading from GPTAC cloud: {url}')
                resp = requests.get(url=url, timeout=30)
                target = os.path.join(get_log_folder(plugin_name='gptac_cloud'), gen_time_str(), pdf_target)
                os.makedirs(os.path.dirname(target), exist_ok=True)
                with open(target, 'wb') as f:
                    f.write(resp.content)
                new_path = promote_file_to_downloadzone(target, chatbot=chatbot)
                success = True
                downloaded.append(new_path)
    except:
        pass
    return success, downloaded
--- a/crazy_functions/latex_fns/latex_toolbox.py
+++ b/crazy_functions/latex_fns/latex_toolbox.py
@@ -644,6 +644,216 @@ def run_in_subprocess(func):
 def _merge_pdfs(pdf1_path, pdf2_path, output_path):
    try:
        logger.info("Merging PDFs using _merge_pdfs_ng")
        _merge_pdfs_ng(pdf1_path, pdf2_path, output_path)
    except:
        logger.info("Merging PDFs using _merge_pdfs_legacy")
        _merge_pdfs_legacy(pdf1_path, pdf2_path, output_path)
 def _merge_pdfs_ng(pdf1_path, pdf2_path, output_path):
    import PyPDF2  # PyPDF2这个库有严重的内存泄露问题，把它放到子进程中运行，从而方便内存的释放
    from PyPDF2.generic import NameObject, TextStringObject, ArrayObject, FloatObject, NumberObject
    Percent = 1
    # raise RuntimeError('PyPDF2 has a serious memory leak problem, please use other tools to merge PDF files.')
    # Open the first PDF file
    with open(pdf1_path, "rb") as pdf1_file:
        pdf1_reader = PyPDF2.PdfFileReader(pdf1_file)
        # Open the second PDF file
        with open(pdf2_path, "rb") as pdf2_file:
            pdf2_reader = PyPDF2.PdfFileReader(pdf2_file)
            # Create a new PDF file to store the merged pages
            output_writer = PyPDF2.PdfFileWriter()
            # Determine the number of pages in each PDF file
            num_pages = max(pdf1_reader.numPages, pdf2_reader.numPages)
            # Merge the pages from the two PDF files
            for page_num in range(num_pages):
                # Add the page from the first PDF file
                if page_num < pdf1_reader.numPages:
                    page1 = pdf1_reader.getPage(page_num)
                else:
                    page1 = PyPDF2.PageObject.createBlankPage(pdf1_reader)
                # Add the page from the second PDF file
                if page_num < pdf2_reader.numPages:
                    page2 = pdf2_reader.getPage(page_num)
                else:
                    page2 = PyPDF2.PageObject.createBlankPage(pdf1_reader)
                # Create a new empty page with double width
                new_page = PyPDF2.PageObject.createBlankPage(
                    width=int(
                        int(page1.mediaBox.getWidth())
                        + int(page2.mediaBox.getWidth()) * Percent
                    ),
                    height=max(page1.mediaBox.getHeight(), page2.mediaBox.getHeight()),
                )
                new_page.mergeTranslatedPage(page1, 0, 0)
                new_page.mergeTranslatedPage(
                    page2,
                    int(
                        int(page1.mediaBox.getWidth())
                        - int(page2.mediaBox.getWidth()) * (1 - Percent)
                    ),
                    0,
                )
                if "/Annots" in new_page:
                    annotations = new_page["/Annots"]
                    for i, annot in enumerate(annotations):
                        annot_obj = annot.get_object()
                        # 检查注释类型是否是链接（/Link）
                        if annot_obj.get("/Subtype") == "/Link":
                            # 检查是否为内部链接跳转（/GoTo）或外部URI链接（/URI）
                            action = annot_obj.get("/A")
                            if action:
                                if "/S" in action and action["/S"] == "/GoTo":
                                    # 内部链接：跳转到文档中的某个页面
                                    dest = action.get("/D")  # 目标页或目标位置
                                    # if dest and annot.idnum in page2_annot_id:
                                    # if dest in pdf2_reader.named_destinations:
                                    if dest and page2.annotations:
                                        if annot in page2.annotations:
                                            # 获取原始文件中跳转信息，包括跳转页面
                                            destination = pdf2_reader.named_destinations[
                                                dest
                                            ]
                                            page_number = (
                                                pdf2_reader.get_destination_page_number(
                                                    destination
                                                )
                                            )
                                            # 更新跳转信息，跳转到对应的页面和，指定坐标 (100, 150)，缩放比例为 100%
                                            # “/D”:[10,'/XYZ',100,100,0]
                                            if destination.dest_array[1] == "/XYZ":
                                                annot_obj["/A"].update(
                                                    {
                                                        NameObject("/D"): ArrayObject(
                                                            [
                                                                NumberObject(page_number),
                                                                destination.dest_array[1],
                                                                FloatObject(
                                                                    destination.dest_array[
                                                                        2
                                                                    ]
                                                                    + int(
                                                                        page1.mediaBox.getWidth()
                                                                    )
                                                                ),
                                                                destination.dest_array[3],
                                                                destination.dest_array[4],
                                                            ]
                                                        )  # 确保键和值是 PdfObject
                                                    }
                                                )
                                            else:
                                                annot_obj["/A"].update(
                                                    {
                                                        NameObject("/D"): ArrayObject(
                                                            [
                                                                NumberObject(page_number),
                                                                destination.dest_array[1],
                                                            ]
                                                        )  # 确保键和值是 PdfObject
                                                    }
                                                )
                                            rect = annot_obj.get("/Rect")
                                            # 更新点击坐标
                                            rect = ArrayObject(
                                                [
                                                    FloatObject(
                                                        rect[0]
                                                        + int(page1.mediaBox.getWidth())
                                                    ),
                                                    rect[1],
                                                    FloatObject(
                                                        rect[2]
                                                        + int(page1.mediaBox.getWidth())
                                                    ),
                                                    rect[3],
                                                ]
                                            )
                                            annot_obj.update(
                                                {
                                                    NameObject(
                                                        "/Rect"
                                                    ): rect  # 确保键和值是 PdfObject
                                                }
                                            )
                                    # if dest and annot.idnum in page1_annot_id:
                                    # if dest in pdf1_reader.named_destinations:
                                    if dest and page1.annotations:
                                        if annot in page1.annotations:
                                            # 获取原始文件中跳转信息，包括跳转页面
                                            destination = pdf1_reader.named_destinations[
                                                dest
                                            ]
                                            page_number = (
                                                pdf1_reader.get_destination_page_number(
                                                    destination
                                                )
                                            )
                                            # 更新跳转信息，跳转到对应的页面和，指定坐标 (100, 150)，缩放比例为 100%
                                            # “/D”:[10,'/XYZ',100,100,0]
                                            if destination.dest_array[1] == "/XYZ":
                                                annot_obj["/A"].update(
                                                    {
                                                        NameObject("/D"): ArrayObject(
                                                            [
                                                                NumberObject(page_number),
                                                                destination.dest_array[1],
                                                                FloatObject(
                                                                    destination.dest_array[
                                                                        2
                                                                    ]
                                                                ),
                                                                destination.dest_array[3],
                                                                destination.dest_array[4],
                                                            ]
                                                        )  # 确保键和值是 PdfObject
                                                    }
                                                )
                                            else:
                                                annot_obj["/A"].update(
                                                    {
                                                        NameObject("/D"): ArrayObject(
                                                            [
                                                                NumberObject(page_number),
                                                                destination.dest_array[1],
                                                            ]
                                                        )  # 确保键和值是 PdfObject
                                                    }
                                                )
                                            rect = annot_obj.get("/Rect")
                                            rect = ArrayObject(
                                                [
                                                    FloatObject(rect[0]),
                                                    rect[1],
                                                    FloatObject(rect[2]),
                                                    rect[3],
                                                ]
                                            )
                                            annot_obj.update(
                                                {
                                                    NameObject(
                                                        "/Rect"
                                                    ): rect  # 确保键和值是 PdfObject
                                                }
                                            )
                                elif "/S" in action and action["/S"] == "/URI":
                                    # 外部链接：跳转到某个URI
                                    uri = action.get("/URI")
                output_writer.addPage(new_page)
            # Save the merged PDF file
            with open(output_path, "wb") as output_file:
                output_writer.write(output_file)
 def _merge_pdfs_legacy(pdf1_path, pdf2_path, output_path):
    import PyPDF2  # PyPDF2这个库有严重的内存泄露问题，把它放到子进程中运行，从而方便内存的释放
    Percent = 0.95
--- a/crazy_functions/pdf_fns/parse_pdf_via_doc2x.py
+++ b/crazy_functions/pdf_fns/parse_pdf_via_doc2x.py
@@ -4,7 +4,9 @@ from toolbox import promote_file_to_downloadzone, extract_archive
 from toolbox import generate_file_link, zip_folder
 from crazy_functions.crazy_utils import get_files_from_everything
 from shared_utils.colorful import *
 from loguru import logger
 import os
 import time
 def refresh_key(doc2x_api_key):
    import requests, json
@@ -22,105 +24,140 @@ def refresh_key(doc2x_api_key):
        raise RuntimeError(format("[ERROR] status code: %d, body: %s" % (res.status_code, res.text)))
    return doc2x_api_key
 def 解析PDF_DOC2X_转Latex(pdf_file_path):
    zip_file_path, unzipped_folder = 解析PDF_DOC2X(pdf_file_path, format='tex')
    return unzipped_folder
 def 解析PDF_DOC2X(pdf_file_path, format='tex'):
    """
        format: 'tex', 'md', 'docx'
    """
    import requests, json, os
    DOC2X_API_KEY = get_conf('DOC2X_API_KEY')
    latex_dir = get_log_folder(plugin_name="pdf_ocr_latex")
    markdown_dir = get_log_folder(plugin_name="pdf_ocr")
    doc2x_api_key = DOC2X_API_KEY
    if doc2x_api_key.startswith('sk-'):
        url = "https://api.doc2x.noedgeai.com/api/v1/pdf"
    else:
        doc2x_api_key = refresh_key(doc2x_api_key)
        url = "https://api.doc2x.noedgeai.com/api/platform/pdf"
    # < ------ 第1步：上传 ------ >
    logger.info("Doc2x 第1步：上传")
    with open(pdf_file_path, 'rb') as file:
        res = requests.post(
            "https://v2.doc2x.noedgeai.com/api/v2/parse/pdf",
            headers={"Authorization": "Bearer " + doc2x_api_key},
            data=file
        )
    # res_json = []
    if res.status_code == 200:
        res_json = res.json()
    else:
        raise RuntimeError(f"Doc2x return an error: {res.json()}")
    uuid = res_json['data']['uid']
    # < ------ 第2步：轮询等待 ------ >
    logger.info("Doc2x 第2步：轮询等待")
    params = {'uid': uuid}
    while True:
        res = requests.get(
            'https://v2.doc2x.noedgeai.com/api/v2/parse/status',
            headers={"Authorization": "Bearer " + doc2x_api_key},
            params=params
        )
        res_json = res.json()
        if res_json['data']['status'] == "success":
            break
        elif res_json['data']['status'] == "processing":
            time.sleep(3)
            logger.info(f"Doc2x is processing at {res_json['data']['progress']}%")
        elif res_json['data']['status'] == "failed":
            raise RuntimeError(f"Doc2x return an error: {res_json}")
    # < ------ 第3步：提交转化 ------ >
    logger.info("Doc2x 第3步：提交转化")
    data = {
        "uid": uuid,
        "to": format,
        "formula_mode": "dollar",
        "filename": "output"
    }
    res = requests.post(
-        url,
+        'https://v2.doc2x.noedgeai.com/api/v2/convert/parse',
-        files={"file": open(pdf_file_path, "rb")},
+        headers={"Authorization": "Bearer " + doc2x_api_key},
-        data={"ocr": "1"},
+        json=data
        headers={"Authorization": "Bearer " + doc2x_api_key}
    )
    res_json = []
    if res.status_code == 200:
-        decoded = res.content.decode("utf-8")
+        res_json = res.json()
        for z_decoded in decoded.split('\n'):
            if len(z_decoded) == 0: continue
            assert z_decoded.startswith("data: ")
            z_decoded = z_decoded[len("data: "):]
            decoded_json = json.loads(z_decoded)
            res_json.append(decoded_json)
    else:
-        raise RuntimeError(format("[ERROR] status code: %d, body: %s" % (res.status_code, res.text)))
+        raise RuntimeError(f"Doc2x return an error: {res.json()}")
    uuid = res_json[0]['uuid']
    to = "latex" # latex, md, docx
    url = "https://api.doc2x.noedgeai.com/api/export"+"?request_id="+uuid+"&to="+to
-    res = requests.get(url, headers={"Authorization": "Bearer " + doc2x_api_key})
+    # < ------ 第4步：等待结果 ------ >
-    latex_zip_path = os.path.join(latex_dir, gen_time_str() + '.zip')
+    logger.info("Doc2x 第4步：等待结果")
-    latex_unzip_path = os.path.join(latex_dir, gen_time_str())
+    params = {'uid': uuid}
-    if res.status_code == 200:
+    while True:
-        with open(latex_zip_path, "wb") as f: f.write(res.content)
+        res = requests.get(
-    else:
+            'https://v2.doc2x.noedgeai.com/api/v2/convert/parse/result',
-        raise RuntimeError(format("[ERROR] status code: %d, body: %s" % (res.status_code, res.text)))
+            headers={"Authorization": "Bearer " + doc2x_api_key},
            params=params
        )
        res_json = res.json()
        if res_json['data']['status'] == "success":
            break
        elif res_json['data']['status'] == "processing":
            time.sleep(3)
            logger.info(f"Doc2x still processing")
        elif res_json['data']['status'] == "failed":
            raise RuntimeError(f"Doc2x return an error: {res_json}")
    # < ------ 第5步：最后的处理 ------ >
    logger.info("Doc2x 第5步：最后的处理")
    if format=='tex':
        target_path = latex_dir
    if format=='md':
        target_path = markdown_dir
    os.makedirs(target_path, exist_ok=True)
    max_attempt = 3
    # < ------ 下载 ------ >
    for attempt in range(max_attempt):
        try:
            result_url = res_json['data']['url']
            res = requests.get(result_url)
            zip_path = os.path.join(target_path, gen_time_str() + '.zip')
            unzip_path = os.path.join(target_path, gen_time_str())
            if res.status_code == 200:
                with open(zip_path, "wb") as f: f.write(res.content)
            else:
                raise RuntimeError(f"Doc2x return an error: {res.json()}")
        except Exception as e:
            if attempt < max_attempt - 1:
                logger.error(f"Failed to download latex file, retrying... {e}")
                time.sleep(3)
                continue
            else:
                raise e
    # < ------ 解压 ------ >
    import zipfile
-    with zipfile.ZipFile(latex_zip_path, 'r') as zip_ref:
+    with zipfile.ZipFile(zip_path, 'r') as zip_ref:
-        zip_ref.extractall(latex_unzip_path)
+        zip_ref.extractall(unzip_path)
-
+    return zip_path, unzip_path
    return latex_unzip_path
 def 解析PDF_DOC2X_单文件(fp, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, DOC2X_API_KEY, user_request):
    def pdf2markdown(filepath):
-        import requests, json, os
+        chatbot.append((None, f"Doc2x 解析中"))
        markdown_dir = get_log_folder(plugin_name="pdf_ocr")
        doc2x_api_key = DOC2X_API_KEY
        if doc2x_api_key.startswith('sk-'):
            url = "https://api.doc2x.noedgeai.com/api/v1/pdf"
        else:
            doc2x_api_key = refresh_key(doc2x_api_key)
            url = "https://api.doc2x.noedgeai.com/api/platform/pdf"
        chatbot.append((None, "加载PDF文件，发送至DOC2X解析..."))
        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
-        res = requests.post(
+        md_zip_path, unzipped_folder = 解析PDF_DOC2X(filepath, format='md')
            url,
            files={"file": open(filepath, "rb")},
            data={"ocr": "1"},
            headers={"Authorization": "Bearer " + doc2x_api_key}
        )
        res_json = []
        if res.status_code == 200:
            decoded = res.content.decode("utf-8")
            for z_decoded in decoded.split('\n'):
                if len(z_decoded) == 0: continue
                assert z_decoded.startswith("data: ")
                z_decoded = z_decoded[len("data: "):]
                decoded_json = json.loads(z_decoded)
                res_json.append(decoded_json)
            if 'limit exceeded' in decoded_json.get('status', ''):
                raise RuntimeError("Doc2x API 页数受限，请联系 Doc2x 方面，并更换新的 API 秘钥。")
        else:
            raise RuntimeError(format("[ERROR] status code: %d, body: %s" % (res.status_code, res.text)))
        uuid = res_json[0]['uuid']
        to = "md" # latex, md, docx
        url = "https://api.doc2x.noedgeai.com/api/export"+"?request_id="+uuid+"&to="+to
        chatbot.append((None, f"读取解析: {url} ..."))
        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
        res = requests.get(url, headers={"Authorization": "Bearer " + doc2x_api_key})
        md_zip_path = os.path.join(markdown_dir, gen_time_str() + '.zip')
        if res.status_code == 200:
            with open(md_zip_path, "wb") as f: f.write(res.content)
        else:
            raise RuntimeError(format("[ERROR] status code: %d, body: %s" % (res.status_code, res.text)))
        promote_file_to_downloadzone(md_zip_path, chatbot=chatbot)
        chatbot.append((None, f"完成解析 {md_zip_path} ..."))
        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
--- a/crazy_functions/rag_fns/llama_index_worker.py
+++ b/crazy_functions/rag_fns/llama_index_worker.py
@@ -1,17 +1,13 @@
 import llama_index
 import os
 import atexit
 from loguru import logger
 from typing import List
 from llama_index.core import Document
 from llama_index.core.schema import TextNode
 from request_llms.embed_models.openai_embed import OpenAiEmbeddingModel
 from shared_utils.connect_void_terminal import get_chat_default_kwargs
 from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
 from crazy_functions.rag_fns.vector_store_index import GptacVectorStoreIndex
 from llama_index.core.ingestion import run_transformations
-from llama_index.core import PromptTemplate
+from llama_index.core.schema import TextNode
-from llama_index.core.response_synthesizers import TreeSummarize
+
 from crazy_functions.rag_fns.vector_store_index import GptacVectorStoreIndex
 from request_llms.embed_models.openai_embed import OpenAiEmbeddingModel
 DEFAULT_QUERY_GENERATION_PROMPT = """\
 Now, you have context information as below:
@@ -63,7 +59,7 @@ class SaveLoad():
    def purge(self):
        import shutil
        shutil.rmtree(self.checkpoint_dir, ignore_errors=True)
-        self.vs_index = self.create_new_vs()
+        self.vs_index = self.create_new_vs(self.checkpoint_dir)
 class LlamaIndexRagWorker(SaveLoad):
@@ -75,7 +71,7 @@ class LlamaIndexRagWorker(SaveLoad):
        if auto_load_checkpoint:
            self.vs_index = self.load_from_checkpoint(checkpoint_dir)
        else:
-            self.vs_index = self.create_new_vs(checkpoint_dir)
+            self.vs_index = self.create_new_vs()
        atexit.register(lambda: self.save_to_checkpoint(checkpoint_dir))
    def assign_embedding_model(self):
@@ -91,40 +87,52 @@ class LlamaIndexRagWorker(SaveLoad):
        logger.info('oo --------inspect_vector_store end--------')
        return vector_store_preview
-    def add_documents_to_vector_store(self, document_list):
+    def add_documents_to_vector_store(self, document_list: List[Document]):
-        documents = [Document(text=t) for t in document_list]
+        """
        Adds a list of Document objects to the vector store after processing.
        """
        documents = document_list
        documents_nodes = run_transformations(
-                        documents,  # type: ignore
+            documents,  # type: ignore
-                        self.vs_index._transformations,
+            self.vs_index._transformations,
-                        show_progress=True
+            show_progress=True
-                    )
+        )
        self.vs_index.insert_nodes(documents_nodes)
-        if self.debug_mode: self.inspect_vector_store()
+        if self.debug_mode:
            self.inspect_vector_store()
-    def add_text_to_vector_store(self, text):
+    def add_text_to_vector_store(self, text: str):
        node = TextNode(text=text)
        documents_nodes = run_transformations(
-                        [node],
+            [node],
-                        self.vs_index._transformations,
+            self.vs_index._transformations,
-                        show_progress=True
+            show_progress=True
-                    )
+        )
        self.vs_index.insert_nodes(documents_nodes)
-        if self.debug_mode: self.inspect_vector_store()
+        if self.debug_mode:
            self.inspect_vector_store()
    def remember_qa(self, question, answer):
        formatted_str = QUESTION_ANSWER_RECORD.format(question=question, answer=answer)
        self.add_text_to_vector_store(formatted_str)
    def retrieve_from_store_with_query(self, query):
-        if self.debug_mode: self.inspect_vector_store()
+        if self.debug_mode:
            self.inspect_vector_store()
        retriever = self.vs_index.as_retriever()
        return retriever.retrieve(query)
    def build_prompt(self, query, nodes):
        context_str = self.generate_node_array_preview(nodes)
        return DEFAULT_QUERY_GENERATION_PROMPT.format(context_str=context_str, query_str=query)
-        
+
    def generate_node_array_preview(self, nodes):
        buf = "\n".join(([f"(No.{i+1} | score {n.score:.3f}): {n.text}" for i, n in enumerate(nodes)]))
        if self.debug_mode: logger.info(buf)
        return buf
    def purge_vector_store(self):
        """
        Purges the current vector store and creates a new one.
        """
        self.purge()
--- a/crazy_functions/rag_fns/rag_file_support.py
+++ b/crazy_functions/rag_fns/rag_file_support.py
@@ -0,0 +1,45 @@
 import os
 from llama_index.core import SimpleDirectoryReader
 supports_format = ['.csv', '.docx','.doc', '.epub', '.ipynb',  '.mbox', '.md', '.pdf',  '.txt', '.ppt',
                   '.pptm', '.pptx','.py', '.xls', '.xlsx', '.html', '.json', '.xml', '.yaml', '.yml' ,'.m']
 def read_docx_doc(file_path):
    if file_path.split(".")[-1] == "docx":
        from docx import Document
        doc = Document(file_path)
        file_content = "\n".join([para.text for para in doc.paragraphs])
    else:
        try:
            import win32com.client
            word = win32com.client.Dispatch("Word.Application")
            word.visible = False
            # 打开文件
            doc = word.Documents.Open(os.getcwd() + '/' + file_path)
            # file_content = doc.Content.Text
            doc = word.ActiveDocument
            file_content = doc.Range().Text
            doc.Close()
            word.Quit()
        except:
            raise RuntimeError('请先将.doc文档转换为.docx文档。')
    return file_content
 # 修改后的 extract_text 函数，结合 SimpleDirectoryReader 和自定义解析逻辑
 import os
 def extract_text(file_path):
    _, ext = os.path.splitext(file_path.lower())
    # 使用 SimpleDirectoryReader 处理它支持的文件格式
    if ext in ['.docx', '.doc']:
        return read_docx_doc(file_path)
    try:
        reader = SimpleDirectoryReader(input_files=[file_path])
        documents = reader.load_data()
        if len(documents) > 0:
            return documents[0].text
    except Exception as e:
        pass
    return None
--- a/crazy_functions/总结word文档.py
+++ b/crazy_functions/总结word文档.py
@@ -1,127 +0,0 @@
 from toolbox import update_ui
 from toolbox import CatchException, report_exception
 from toolbox import write_history_to_file, promote_file_to_downloadzone
 from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
 fast_debug = False
 def 解析docx(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
    import time, os
    # pip install python-docx 用于docx格式，跨平台
    # pip install pywin32 用于doc格式，仅支持Win平台
    for index, fp in enumerate(file_manifest):
        if fp.split(".")[-1] == "docx":
            from docx import Document
            doc = Document(fp)
            file_content = "\n".join([para.text for para in doc.paragraphs])
        else:
            try:
                import win32com.client
                word = win32com.client.Dispatch("Word.Application")
                word.visible = False
                # 打开文件
                doc = word.Documents.Open(os.getcwd() + '/' + fp)
                # file_content = doc.Content.Text
                doc = word.ActiveDocument
                file_content = doc.Range().Text
                doc.Close()
                word.Quit()
            except:
                raise RuntimeError('请先将.doc文档转换为.docx文档。')
        # private_upload里面的文件名在解压zip后容易出现乱码（rar和7z格式正常），故可以只分析文章内容，不输入文件名
        from crazy_functions.pdf_fns.breakdown_txt import breakdown_text_to_satisfy_token_limit
        from request_llms.bridge_all import model_info
        max_token = model_info[llm_kwargs['llm_model']]['max_token']
        TOKEN_LIMIT_PER_FRAGMENT = max_token * 3 // 4
        paper_fragments = breakdown_text_to_satisfy_token_limit(txt=file_content, limit=TOKEN_LIMIT_PER_FRAGMENT, llm_model=llm_kwargs['llm_model'])
        this_paper_history = []
        for i, paper_frag in enumerate(paper_fragments):
            i_say = f'请对下面的文章片段用中文做概述，文件名是{os.path.relpath(fp, project_folder)}，文章内容是 ```{paper_frag}```'
            i_say_show_user = f'请对下面的文章片段做概述: {os.path.abspath(fp)}的第{i+1}/{len(paper_fragments)}个片段。'
            gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
                inputs=i_say,
                inputs_show_user=i_say_show_user,
                llm_kwargs=llm_kwargs,
                chatbot=chatbot,
                history=[],
                sys_prompt="总结文章。"
            )
            chatbot[-1] = (i_say_show_user, gpt_say)
            history.extend([i_say_show_user,gpt_say])
            this_paper_history.extend([i_say_show_user,gpt_say])
        # 已经对该文章的所有片段总结完毕，如果文章被切分了，
        if len(paper_fragments) > 1:
            i_say = f"根据以上的对话，总结文章{os.path.abspath(fp)}的主要内容。"
            gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
                inputs=i_say,
                inputs_show_user=i_say,
                llm_kwargs=llm_kwargs,
                chatbot=chatbot,
                history=this_paper_history,
                sys_prompt="总结文章。"
            )
            history.extend([i_say,gpt_say])
            this_paper_history.extend([i_say,gpt_say])
        res = write_history_to_file(history)
        promote_file_to_downloadzone(res, chatbot=chatbot)
        chatbot.append(("完成了吗？", res))
        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
    res = write_history_to_file(history)
    promote_file_to_downloadzone(res, chatbot=chatbot)
    chatbot.append(("所有文件都总结完成了吗？", res))
    yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
@CatchException
 def 总结word文档(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
    import glob, os
    # 基本信息：功能、贡献者
    chatbot.append([
        "函数插件功能？",
        "批量总结Word文档。函数插件贡献者: JasonGuo1。注意, 如果是.doc文件, 请先转化为.docx格式。"])
    yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
    # 尝试导入依赖，如果缺少依赖，则给出安装建议
    try:
        from docx import Document
    except:
        report_exception(chatbot, history,
                         a=f"解析项目: {txt}",
                         b=f"导入软件依赖失败。使用该模块需要额外依赖，安装方法```pip install --upgrade python-docx pywin32```。")
        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
        return
    # 清空历史，以免输入溢出
    history = []
    # 检测输入参数，如没有给定输入参数，直接退出
    if os.path.exists(txt):
        project_folder = txt
    else:
        if txt == "": txt = '空空如也的输入栏'
        report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"找不到本地项目或无权访问: {txt}")
        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
        return
    # 搜索需要处理的文件清单
    if txt.endswith('.docx') or txt.endswith('.doc'):
        file_manifest = [txt]
    else:
        file_manifest = [f for f in glob.glob(f'{project_folder}/**/*.docx', recursive=True)] + \
                        [f for f in glob.glob(f'{project_folder}/**/*.doc', recursive=True)]
    # 如果没找到任何文件
    if len(file_manifest) == 0:
        report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"找不到任何.docx或doc文件: {txt}")
        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
        return
    # 开始正式执行任务
    yield from 解析docx(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt)
--- a/crazy_functions/批量文件询问.py
+++ b/crazy_functions/批量文件询问.py
@@ -0,0 +1,496 @@
 import os
 import threading
 import time
 from dataclasses import dataclass
 from typing import List, Tuple, Dict, Generator
 from crazy_functions.crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
 from crazy_functions.pdf_fns.breakdown_txt import breakdown_text_to_satisfy_token_limit
 from crazy_functions.rag_fns.rag_file_support import extract_text
 from request_llms.bridge_all import model_info
 from toolbox import update_ui, CatchException, report_exception
@dataclass
 class FileFragment:
    """文件片段数据类，用于组织处理单元"""
    file_path: str
    content: str
    rel_path: str
    fragment_index: int
    total_fragments: int
 class BatchDocumentSummarizer:
    """优化的文档总结器 - 批处理版本"""
    def __init__(self, llm_kwargs: Dict, plugin_kwargs: Dict, chatbot: List, history: List, system_prompt: str):
        """初始化总结器"""
        self.llm_kwargs = llm_kwargs
        self.plugin_kwargs = plugin_kwargs
        self.chatbot = chatbot
        self.history = history
        self.system_prompt = system_prompt
        self.failed_files = []
        self.file_summaries_map = {}
    def _get_token_limit(self) -> int:
        """获取模型token限制"""
        max_token = model_info[self.llm_kwargs['llm_model']]['max_token']
        return max_token * 3 // 4
    def _create_batch_inputs(self, fragments: List[FileFragment]) -> Tuple[List, List, List]:
        """创建批处理输入"""
        inputs_array = []
        inputs_show_user_array = []
        history_array = []
        for frag in fragments:
            if self.plugin_kwargs.get("advanced_arg"):
                i_say = (f'请按照用户要求对文件内容进行处理，文件名为{os.path.basename(frag.file_path)}，'
                         f'用户要求为：{self.plugin_kwargs["advanced_arg"]}：'
                         f'文件内容是 ```{frag.content}```')
                i_say_show_user = (f'正在处理 {frag.rel_path} (片段 {frag.fragment_index + 1}/{frag.total_fragments})')
            else:
                i_say = (f'请对下面的内容用中文做总结，不超过500字，文件名是{os.path.basename(frag.file_path)}，'
                         f'内容是 ```{frag.content}```')
                i_say_show_user = f'正在处理 {frag.rel_path} (片段 {frag.fragment_index + 1}/{frag.total_fragments})'
            inputs_array.append(i_say)
            inputs_show_user_array.append(i_say_show_user)
            history_array.append([])
        return inputs_array, inputs_show_user_array, history_array
    def _process_single_file_with_timeout(self, file_info: Tuple[str, str], mutable_status: List) -> List[FileFragment]:
        """包装了超时控制的文件处理函数"""
        def timeout_handler():
            thread = threading.current_thread()
            if hasattr(thread, '_timeout_occurred'):
                thread._timeout_occurred = True
        # 设置超时标记
        thread = threading.current_thread()
        thread._timeout_occurred = False
        # 设置超时定时器
        timer = threading.Timer(self.watch_dog_patience, timeout_handler)
        timer.start()
        try:
            fp, project_folder = file_info
            fragments = []
            # 定期检查是否超时
            def check_timeout():
                if hasattr(thread, '_timeout_occurred') and thread._timeout_occurred:
                    raise TimeoutError("处理超时")
            # 更新状态
            mutable_status[0] = "检查文件大小"
            mutable_status[1] = time.time()
            check_timeout()
            # 文件大小检查
            if os.path.getsize(fp) > self.max_file_size:
                self.failed_files.append((fp, f"文件过大：超过{self.max_file_size / 1024 / 1024}MB"))
                mutable_status[2] = "文件过大"
                return fragments
            check_timeout()
            # 更新状态
            mutable_status[0] = "提取文件内容"
            mutable_status[1] = time.time()
            # 提取内容
            content = extract_text(fp)
            if content is None:
                self.failed_files.append((fp, "文件解析失败：不支持的格式或文件损坏"))
                mutable_status[2] = "格式不支持"
                return fragments
            elif not content.strip():
                self.failed_files.append((fp, "文件内容为空"))
                mutable_status[2] = "内容为空"
                return fragments
            check_timeout()
            # 更新状态
            mutable_status[0] = "分割文本"
            mutable_status[1] = time.time()
            # 分割文本
            try:
                paper_fragments = breakdown_text_to_satisfy_token_limit(
                    txt=content,
                    limit=self._get_token_limit(),
                    llm_model=self.llm_kwargs['llm_model']
                )
            except Exception as e:
                self.failed_files.append((fp, f"文本分割失败：{str(e)}"))
                mutable_status[2] = "分割失败"
                return fragments
            check_timeout()
            # 处理片段
            rel_path = os.path.relpath(fp, project_folder)
            for i, frag in enumerate(paper_fragments):
                if frag.strip():
                    fragments.append(FileFragment(
                        file_path=fp,
                        content=frag,
                        rel_path=rel_path,
                        fragment_index=i,
                        total_fragments=len(paper_fragments)
                    ))
            mutable_status[2] = "处理完成"
            return fragments
        except TimeoutError as e:
            self.failed_files.append((fp, "处理超时"))
            mutable_status[2] = "处理超时"
            return []
        except Exception as e:
            self.failed_files.append((fp, f"处理失败：{str(e)}"))
            mutable_status[2] = "处理异常"
            return []
        finally:
            timer.cancel()
    def prepare_fragments(self, project_folder: str, file_paths: List[str]) -> Generator:
        import concurrent.futures
        from concurrent.futures import ThreadPoolExecutor
        from typing import Generator, List
        """并行准备所有文件的处理片段"""
        all_fragments = []
        total_files = len(file_paths)
        # 配置参数
        self.refresh_interval = 0.2  # UI刷新间隔
        self.watch_dog_patience = 5  # 看门狗超时时间
        self.max_file_size = 10 * 1024 * 1024  # 10MB限制
        self.max_workers = min(32, len(file_paths))  # 最多32个线程
        # 创建有超时控制的线程池
        executor = ThreadPoolExecutor(max_workers=self.max_workers)
        # 用于跨线程状态传递的可变列表 - 增加文件名信息
        mutable_status_array = [["等待中", time.time(), "pending", file_path] for file_path in file_paths]
        # 创建文件处理任务
        file_infos = [(fp, project_folder) for fp in file_paths]
        # 提交所有任务，使用带超时控制的处理函数
        futures = [
            executor.submit(
                self._process_single_file_with_timeout,
                file_info,
                mutable_status_array[i]
            ) for i, file_info in enumerate(file_infos)
        ]
        # 更新UI的计数器
        cnt = 0
        try:
            # 监控任务执行
            while True:
                time.sleep(self.refresh_interval)
                cnt += 1
                # 检查任务完成状态
                worker_done = [f.done() for f in futures]
                # 更新状态显示
                status_str = ""
                for i, (status, timestamp, desc, file_path) in enumerate(mutable_status_array):
                    # 获取文件名（去掉路径）
                    file_name = os.path.basename(file_path)
                    if worker_done[i]:
                        status_str += f"文件 {file_name}: {desc}\n"
                    else:
                        status_str += f"文件 {file_name}: {status} {desc}\n"
                # 更新UI
                self.chatbot[-1] = [
                    "处理进度",
                    f"正在处理文件...\n\n{status_str}" + "." * (cnt % 10 + 1)
                ]
                yield from update_ui(chatbot=self.chatbot, history=self.history)
                # 检查是否所有任务完成
                if all(worker_done):
                    break
        finally:
            # 确保线程池正确关闭
            executor.shutdown(wait=False)
        # 收集结果
        processed_files = 0
        for future in futures:
            try:
                fragments = future.result(timeout=0.1)  # 给予一个短暂的超时时间来获取结果
                all_fragments.extend(fragments)
                processed_files += 1
            except concurrent.futures.TimeoutError:
                # 处理获取结果超时
                file_index = futures.index(future)
                self.failed_files.append((file_paths[file_index], "结果获取超时"))
                continue
            except Exception as e:
                # 处理其他异常
                file_index = futures.index(future)
                self.failed_files.append((file_paths[file_index], f"未知错误：{str(e)}"))
                continue
        # 最终进度更新
        self.chatbot.append([
            "文件处理完成",
            f"成功处理 {len(all_fragments)} 个片段，失败 {len(self.failed_files)} 个文件"
        ])
        yield from update_ui(chatbot=self.chatbot, history=self.history)
        return all_fragments
    def _process_fragments_batch(self, fragments: List[FileFragment]) -> Generator:
        """批量处理文件片段"""
        from collections import defaultdict
        batch_size = 64  # 每批处理的片段数
        max_retries = 3  # 最大重试次数
        retry_delay = 5  # 重试延迟（秒）
        results = defaultdict(list)
        # 按批次处理
        for i in range(0, len(fragments), batch_size):
            batch = fragments[i:i + batch_size]
            inputs_array, inputs_show_user_array, history_array = self._create_batch_inputs(batch)
            sys_prompt_array = ["请总结以下内容："] * len(batch)
            # 添加重试机制
            for retry in range(max_retries):
                try:
                    response_collection = yield from request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
                        inputs_array=inputs_array,
                        inputs_show_user_array=inputs_show_user_array,
                        llm_kwargs=self.llm_kwargs,
                        chatbot=self.chatbot,
                        history_array=history_array,
                        sys_prompt_array=sys_prompt_array,
                    )
                    # 处理响应
                    for j, frag in enumerate(batch):
                        summary = response_collection[j * 2 + 1]
                        if summary and summary.strip():
                            results[frag.rel_path].append({
                                'index': frag.fragment_index,
                                'summary': summary,
                                'total': frag.total_fragments
                            })
                    break  # 成功处理，跳出重试循环
                except Exception as e:
                    if retry == max_retries - 1:  # 最后一次重试失败
                        for frag in batch:
                            self.failed_files.append((frag.file_path, f"处理失败：{str(e)}"))
                    else:
                        yield from update_ui(self.chatbot.append([f"批次处理失败，{retry_delay}秒后重试...", str(e)]))
                        time.sleep(retry_delay)
        return results
    def _generate_final_summary_request(self) -> Tuple[List, List, List]:
        """准备最终总结请求"""
        if not self.file_summaries_map:
            return (["无可用的文件总结"], ["生成最终总结"], [[]])
        summaries = list(self.file_summaries_map.values())
        if all(not summary for summary in summaries):
            return (["所有文件处理均失败"], ["生成最终总结"], [[]])
        if self.plugin_kwargs.get("advanced_arg"):
            i_say = "根据以上所有文件的处理结果，按要求进行综合处理：" + self.plugin_kwargs['advanced_arg']
        else:
            i_say = "请根据以上所有文件的处理结果，生成最终的总结，不超过1000字。"
        return ([i_say], [i_say], [summaries])
    def process_files(self, project_folder: str, file_paths: List[str]) -> Generator:
        """处理所有文件"""
        total_files = len(file_paths)
        self.chatbot.append([f"开始处理", f"总计 {total_files} 个文件"])
        yield from update_ui(chatbot=self.chatbot, history=self.history)
        # 1. 准备所有文件片段
        # 在 process_files 函数中：
        fragments = yield from self.prepare_fragments(project_folder, file_paths)
        if not fragments:
            self.chatbot.append(["处理失败", "没有可处理的文件内容"])
            return "没有可处理的文件内容"
        # 2. 批量处理所有文件片段
        self.chatbot.append([f"文件分析", f"共计 {len(fragments)} 个处理单元"])
        yield from update_ui(chatbot=self.chatbot, history=self.history)
        try:
            file_summaries = yield from self._process_fragments_batch(fragments)
        except Exception as e:
            self.chatbot.append(["处理错误", f"批处理过程失败：{str(e)}"])
            return "处理过程发生错误"
        # 3. 为每个文件生成整体总结
        self.chatbot.append(["生成总结", "正在汇总文件内容..."])
        yield from update_ui(chatbot=self.chatbot, history=self.history)
        # 处理每个文件的总结
        for rel_path, summaries in file_summaries.items():
            if len(summaries) > 1:  # 多片段文件需要生成整体总结
                sorted_summaries = sorted(summaries, key=lambda x: x['index'])
                if self.plugin_kwargs.get("advanced_arg"):
                    i_say = f'请按照用户要求对文件内容进行处理，用户要求为：{self.plugin_kwargs["advanced_arg"]}：'
                else:
                    i_say = f"请总结文件 {os.path.basename(rel_path)} 的主要内容，不超过500字。"
                try:
                    summary_texts = [s['summary'] for s in sorted_summaries]
                    response_collection = yield from request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
                        inputs_array=[i_say],
                        inputs_show_user_array=[f"生成 {rel_path} 的处理结果"],
                        llm_kwargs=self.llm_kwargs,
                        chatbot=self.chatbot,
                        history_array=[summary_texts],
                        sys_prompt_array=["你是一个优秀的助手，"],
                    )
                    self.file_summaries_map[rel_path] = response_collection[1]
                except Exception as e:
                    self.chatbot.append(["警告", f"文件 {rel_path} 总结生成失败：{str(e)}"])
                    self.file_summaries_map[rel_path] = "总结生成失败"
            else:  # 单片段文件直接使用其唯一的总结
                self.file_summaries_map[rel_path] = summaries[0]['summary']
        # 4. 生成最终总结
        if total_files ==1:
            return "文件数为1，此时不调用总结模块"
        else:
            try:
                # 收集所有文件的总结用于生成最终总结
                file_summaries_for_final = []
                for rel_path, summary in self.file_summaries_map.items():
                    file_summaries_for_final.append(f"文件 {rel_path} 的总结：\n{summary}")
                if self.plugin_kwargs.get("advanced_arg"):
                    final_summary_prompt = ("根据以下所有文件的总结内容，按要求进行综合处理：" +
                                            self.plugin_kwargs['advanced_arg'])
                else:
                    final_summary_prompt = "请根据以下所有文件的总结内容，生成最终的总结报告。"
                response_collection = yield from request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
                    inputs_array=[final_summary_prompt],
                    inputs_show_user_array=["生成最终总结报告"],
                    llm_kwargs=self.llm_kwargs,
                    chatbot=self.chatbot,
                    history_array=[file_summaries_for_final],
                    sys_prompt_array=["总结所有文件内容。"],
                    max_workers=1
                )
                return response_collection[1] if len(response_collection) > 1 else "生成总结失败"
            except Exception as e:
                self.chatbot.append(["错误", f"最终总结生成失败：{str(e)}"])
                return "生成总结失败"
    def save_results(self, final_summary: str):
        """保存结果到文件"""
        from toolbox import promote_file_to_downloadzone, write_history_to_file
        from crazy_functions.doc_fns.batch_file_query_doc import MarkdownFormatter, HtmlFormatter, WordFormatter
        import os
        timestamp = time.strftime("%Y%m%d_%H%M%S")
        # 创建各种格式化器
        md_formatter = MarkdownFormatter(final_summary, self.file_summaries_map, self.failed_files)
        html_formatter = HtmlFormatter(final_summary, self.file_summaries_map, self.failed_files)
        word_formatter = WordFormatter(final_summary, self.file_summaries_map, self.failed_files)
        result_files = []
        # 保存 Markdown
        md_content = md_formatter.create_document()
        result_file_md = write_history_to_file(
            history=[md_content],  # 直接传入内容列表
            file_basename=f"文档总结_{timestamp}.md"
        )
        result_files.append(result_file_md)
        # 保存 HTML
        html_content = html_formatter.create_document()
        result_file_html = write_history_to_file(
            history=[html_content],
            file_basename=f"文档总结_{timestamp}.html"
        )
        result_files.append(result_file_html)
        # 保存 Word
        doc = word_formatter.create_document()
        # 由于 Word 文档需要用 doc.save()，我们使用与 md 文件相同的目录
        result_file_docx = os.path.join(
            os.path.dirname(result_file_md),
            f"文档总结_{timestamp}.docx"
        )
        doc.save(result_file_docx)
        result_files.append(result_file_docx)
        # 添加到下载区
        for file in result_files:
            promote_file_to_downloadzone(file, chatbot=self.chatbot)
        self.chatbot.append(["处理完成", f"结果已保存至: {', '.join(result_files)}"])
@CatchException
 def 批量文件询问(txt: str, llm_kwargs: Dict, plugin_kwargs: Dict, chatbot: List,
                 history: List, system_prompt: str, user_request: str):
    """主函数 - 优化版本"""
    # 初始化
    import glob
    import re
    from crazy_functions.rag_fns.rag_file_support import supports_format
    from toolbox import report_exception
    summarizer = BatchDocumentSummarizer(llm_kwargs, plugin_kwargs, chatbot, history, system_prompt)
    chatbot.append(["函数插件功能", f"作者：lbykkkk，批量总结文件。支持格式: {', '.join(supports_format)}等其他文本格式文件，如果长时间卡在文件处理过程，请查看处理进度，然后删除所有处于“pending”状态的文件，然后重新上传处理。"])
    yield from update_ui(chatbot=chatbot, history=history)
    # 验证输入路径
    if not os.path.exists(txt):
        report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"找不到项目或无权访问: {txt}")
        yield from update_ui(chatbot=chatbot, history=history)
        return
    # 获取文件列表
    project_folder = txt
    extract_folder = next((d for d in glob.glob(f'{project_folder}/*')
                           if os.path.isdir(d) and d.endswith('.extract')), project_folder)
    exclude_patterns = r'/[^/]+\.(zip|rar|7z|tar|gz)$'
    file_manifest = [f for f in glob.glob(f'{extract_folder}/**', recursive=True)
                     if os.path.isfile(f) and not re.search(exclude_patterns, f)]
    if not file_manifest:
        report_exception(chatbot, history, a=f"解析项目: {txt}", b="未找到支持的文件类型")
        yield from update_ui(chatbot=chatbot, history=history)
        return
    # 处理所有文件并生成总结
    final_summary = yield from summarizer.process_files(project_folder, file_manifest)
    yield from update_ui(chatbot=chatbot, history=history)
    # 保存结果
    summarizer.save_results(final_summary)
    yield from update_ui(chatbot=chatbot, history=history)
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -180,6 +180,7 @@ version: '3'
 services:
  gpt_academic_with_latex:
    image: ghcr.io/binary-husky/gpt_academic_with_latex:master  # (Auto Built by Dockerfile: docs/GithubAction+NoLocal+Latex)
    # 对于ARM64设备，请将以上镜像名称替换为 ghcr.io/binary-husky/gpt_academic_with_latex_arm:master
    environment:
      # 请查阅 `config.py` 以查看所有的配置信息
      API_KEY:                  '    sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                              '
--- a/docs/Dockerfile+JittorLLM
+++ b/docs/Dockerfile+JittorLLM
@@ -1 +0,0 @@
 # 此Dockerfile不再维护，请前往docs/GithubAction+JittorLLMs
--- a/docs/GithubAction+AllCapacityBeta
+++ b/docs/GithubAction+AllCapacityBeta
@@ -1,57 +0,0 @@
 # docker build -t gpt-academic-all-capacity -f docs/GithubAction+AllCapacity  --network=host --build-arg http_proxy=http://localhost:10881 --build-arg https_proxy=http://localhost:10881 .
 # docker build -t gpt-academic-all-capacity -f docs/GithubAction+AllCapacityBeta  --network=host .
 # docker run -it --net=host gpt-academic-all-capacity  bash
 # 从NVIDIA源，从而支持显卡（检查宿主的nvidia-smi中的cuda版本必须>=11.3）
 FROM fuqingxu/11.3.1-runtime-ubuntu20.04-with-texlive:latest
 # edge-tts需要的依赖，某些pip包所需的依赖
 RUN apt update && apt install ffmpeg build-essential -y
 # use python3 as the system default python
 WORKDIR /gpt
 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.8
 # # 非必要步骤，更换pip源 （以下三行，可以删除）
 # RUN echo '[global]' > /etc/pip.conf && \
 #     echo 'index-url = https://mirrors.aliyun.com/pypi/simple/' >> /etc/pip.conf && \
 #     echo 'trusted-host = mirrors.aliyun.com' >> /etc/pip.conf
 # 下载pytorch
 RUN python3 -m pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu113
 # 准备pip依赖
 RUN python3 -m pip install openai numpy arxiv rich
 RUN python3 -m pip install colorama Markdown pygments pymupdf
 RUN python3 -m pip install python-docx moviepy pdfminer
 RUN python3 -m pip install zh_langchain==0.2.1 pypinyin
 RUN python3 -m pip install rarfile py7zr
 RUN python3 -m pip install aliyun-python-sdk-core==2.13.3 pyOpenSSL webrtcvad scipy git+https://github.com/aliyun/alibabacloud-nls-python-sdk.git
 # 下载分支
 WORKDIR /gpt
 RUN git clone --depth=1 https://github.com/binary-husky/gpt_academic.git
 WORKDIR /gpt/gpt_academic
 RUN git clone --depth=1 https://github.com/OpenLMLab/MOSS.git request_llms/moss
 RUN python3 -m pip install -r requirements.txt
 RUN python3 -m pip install -r request_llms/requirements_moss.txt
 RUN python3 -m pip install -r request_llms/requirements_qwen.txt
 RUN python3 -m pip install -r request_llms/requirements_chatglm.txt
 RUN python3 -m pip install -r request_llms/requirements_newbing.txt
 RUN python3 -m pip install nougat-ocr
 # 预热Tiktoken模块
 RUN python3  -c 'from check_proxy import warm_up_modules; warm_up_modules()'
 # 安装知识库插件的额外依赖
 RUN apt-get update && apt-get install libgl1 -y
 RUN pip3 install transformers protobuf langchain sentence-transformers  faiss-cpu nltk beautifulsoup4 bitsandbytes tabulate icetk --upgrade
 RUN pip3 install unstructured[all-docs] --upgrade
 RUN python3  -c 'from check_proxy import warm_up_vectordb; warm_up_vectordb()'
 RUN rm -rf /usr/local/lib/python3.8/dist-packages/tests
 # COPY .cache /root/.cache
 # COPY config_private.py config_private.py
 # 启动
 CMD ["python3", "-u", "main.py"]
--- a/docs/GithubAction+NoLocal+Latex
+++ b/docs/GithubAction+NoLocal+Latex
@@ -1,35 +1,34 @@
-# 此Dockerfile适用于“无本地模型”的环境构建，如果需要使用chatglm等本地模型，请参考 docs/Dockerfile+ChatGLM
+# 此Dockerfile适用于"无本地模型"的环境构建，如果需要使用chatglm等本地模型，请参考 docs/Dockerfile+ChatGLM
 # - 1 修改 `config.py`
 # - 2 构建 docker build -t gpt-academic-nolocal-latex -f docs/GithubAction+NoLocal+Latex .
 # - 3 运行 docker run -v /home/fuqingxu/arxiv_cache:/root/arxiv_cache --rm -it --net=host gpt-academic-nolocal-latex
-FROM fuqingxu/python311_texlive_ctex:latest
+FROM menghuan1918/ubuntu_uv_ctex:latest
-ENV PATH "$PATH:/usr/local/texlive/2022/bin/x86_64-linux"
+ENV DEBIAN_FRONTEND=noninteractive
-ENV PATH "$PATH:/usr/local/texlive/2023/bin/x86_64-linux"
+SHELL ["/bin/bash", "-c"]
 ENV PATH "$PATH:/usr/local/texlive/2024/bin/x86_64-linux"
 ENV PATH "$PATH:/usr/local/texlive/2025/bin/x86_64-linux"
 ENV PATH "$PATH:/usr/local/texlive/2026/bin/x86_64-linux"
 # 指定路径
 WORKDIR /gpt
-RUN pip3 install openai numpy arxiv rich
+# 先复制依赖文件
-RUN pip3 install colorama Markdown pygments pymupdf
+COPY requirements.txt .
 RUN pip3 install python-docx pdfminer
 RUN pip3 install nougat-ocr
 # 装载项目文件
 COPY . .
 # 安装依赖
-RUN pip3 install -r requirements.txt
+RUN pip install --break-system-packages openai numpy arxiv rich colorama Markdown pygments pymupdf python-docx pdfminer \
    && pip install --break-system-packages -r requirements.txt \
    && if [ "$(uname -m)" = "x86_64" ]; then \
    pip install --break-system-packages nougat-ocr; \
    fi \
    && pip cache purge \
    && rm -rf /root/.cache/pip/*
-# edge-tts需要的依赖
+# 创建非root用户
-RUN apt update && apt install ffmpeg -y
+RUN useradd -m gptuser && chown -R gptuser /gpt
 USER gptuser
 # 最后才复制代码文件,这样代码更新时只需重建最后几层，可以大幅减少docker pull所需的大小
 COPY --chown=gptuser:gptuser . .
 # 可选步骤，用于预热模块
-RUN python3  -c 'from check_proxy import warm_up_modules; warm_up_modules()'
+RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'
 # 启动
 CMD ["python3", "-u", "main.py"]
--- a/docs/WithFastapi.md
+++ b/docs/WithFastapi.md
@@ -4,7 +4,7 @@ We currently support fastapi in order to solve sub-path deploy issue.
 1. change CUSTOM_PATH setting in `config.py`
-``` sh
+```sh
 nano config.py
 ```
@@ -35,9 +35,8 @@ if __name__ == "__main__":
    main()
 ```
 3. Go!
-``` sh
+```sh
 python main.py
 ```
--- a/docs/translate_english.json
+++ b/docs/translate_english.json
--- a/docs/translate_std.json
+++ b/docs/translate_std.json
@@ -108,5 +108,22 @@
    "解析PDF_简单拆解": "ParsePDF_simpleDecomposition",
    "解析PDF_DOC2X_单文件": "ParsePDF_DOC2X_singleFile",
    "注释Python项目": "CommentPythonProject",
-    "注释源代码": "CommentSourceCode"
+    "注释源代码": "CommentSourceCode",
    "log亮黄": "log_yellow",
    "log亮绿": "log_green",
    "log亮红": "log_red",
    "log亮紫": "log_purple",
    "log亮蓝": "log_blue",
    "Rag问答": "RagQA",
    "sprint红": "sprint_red",
    "sprint绿": "sprint_green",
    "sprint黄": "sprint_yellow",
    "sprint蓝": "sprint_blue",
    "sprint紫": "sprint_purple",
    "sprint靛": "sprint_indigo",
    "sprint亮红": "sprint_bright_red",
    "sprint亮绿": "sprint_bright_green",
    "sprint亮黄": "sprint_bright_yellow",
    "sprint亮蓝": "sprint_bright_blue",
    "sprint亮紫": "sprint_bright_purple"
 }
--- a/request_llms/bridge_all.py
+++ b/request_llms/bridge_all.py
@@ -256,6 +256,8 @@ model_info = {
        "max_token": 128000,
        "tokenizer": tokenizer_gpt4,
        "token_cnt": get_token_num_gpt4,
        "openai_disable_system_prompt": True,
        "openai_disable_stream": True,
    },
    "o1-mini": {
        "fn_with_ui": chatgpt_ui,
@@ -264,6 +266,8 @@ model_info = {
        "max_token": 128000,
        "tokenizer": tokenizer_gpt4,
        "token_cnt": get_token_num_gpt4,
        "openai_disable_system_prompt": True,
        "openai_disable_stream": True,
    },
    "gpt-4-turbo": {
@@ -1116,6 +1120,24 @@ if len(AZURE_CFG_ARRAY) > 0:
        if azure_model_name not in AVAIL_LLM_MODELS:
            AVAIL_LLM_MODELS += [azure_model_name]
 # -=-=-=-=-=-=- Openrouter模型对齐支持 -=-=-=-=-=-=-
 # 为了更灵活地接入Openrouter路由，设计了此接口
 for model in [m for m in AVAIL_LLM_MODELS if m.startswith("openrouter-")]:
    from request_llms.bridge_openrouter import predict_no_ui_long_connection as openrouter_noui
    from request_llms.bridge_openrouter import predict as openrouter_ui
    model_info.update({
        model: {
            "fn_with_ui": openrouter_ui,
            "fn_without_ui": openrouter_noui,
            # 以下参数参考gpt-4o-mini的配置, 请根据实际情况修改
            "endpoint": openai_endpoint,
            "has_multimodal_capacity": True,
            "max_token": 128000,
            "tokenizer": tokenizer_gpt4,
            "token_cnt": get_token_num_gpt4,
        },
    })
 # -=-=-=-=-=-=--=-=-=-=-=-=--=-=-=-=-=-=--=-=-=-=-=-=-=-=
 # -=-=-=-=-=-=-=-=-=- ☝️ 以上是模型路由 -=-=-=-=-=-=-=-=-=
@@ -1261,5 +1283,5 @@ def predict(inputs:str, llm_kwargs:dict, plugin_kwargs:dict, chatbot,
    if additional_fn: # 根据基础功能区 ModelOverride 参数调整模型类型
        llm_kwargs, additional_fn, method = execute_model_override(llm_kwargs, additional_fn, method)
    # 更新一下llm_kwargs的参数，否则会出现参数不匹配的问题
    yield from method(inputs, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, stream, additional_fn)
--- a/request_llms/bridge_chatgpt.py
+++ b/request_llms/bridge_chatgpt.py
@@ -134,22 +134,33 @@ def predict_no_ui_long_connection(inputs:str, llm_kwargs:dict, history:list=[],
    observe_window = None：
        用于负责跨越线程传递已经输出的部分，大部分时候仅仅为了fancy的视觉效果，留空即可。observe_window[0]：观测窗。observe_window[1]：看门狗
    """
    from request_llms.bridge_all import model_info
    watch_dog_patience = 5 # 看门狗的耐心, 设置5秒即可
-    headers, payload = generate_payload(inputs, llm_kwargs, history, system_prompt=sys_prompt, stream=True)
+
    if model_info[llm_kwargs['llm_model']].get('openai_disable_stream', False): stream = False
    else: stream = True
    headers, payload = generate_payload(inputs, llm_kwargs, history, system_prompt=sys_prompt, stream=stream)
    retry = 0
    while True:
        try:
            # make a POST request to the API endpoint, stream=False
            from .bridge_all import model_info
            endpoint = verify_endpoint(model_info[llm_kwargs['llm_model']]['endpoint'])
            response = requests.post(endpoint, headers=headers, proxies=proxies,
-                                    json=payload, stream=True, timeout=TIMEOUT_SECONDS); break
+                                    json=payload, stream=stream, timeout=TIMEOUT_SECONDS); break
        except requests.exceptions.ReadTimeout as e:
            retry += 1
            traceback.print_exc()
            if retry > MAX_RETRY: raise TimeoutError
            if MAX_RETRY!=0: logger.error(f'请求超时，正在重试 ({retry}/{MAX_RETRY}) ……')
    if not stream:
        # 该分支仅适用于不支持stream的o1模型，其他情形一律不适用
        chunkjson = json.loads(response.content.decode())
        gpt_replying_buffer = chunkjson['choices'][0]["message"]["content"]
        return gpt_replying_buffer
    stream_response = response.iter_lines()
    result = ''
    json_data = None
@@ -181,7 +192,7 @@ def predict_no_ui_long_connection(inputs:str, llm_kwargs:dict, history:list=[],
        if (not has_content) and (not has_role): continue # raise RuntimeError("发现不标准的第三方接口："+delta)
        if has_content: # has_role = True/False
            result += delta["content"]
-            if not console_slience: logger.info(delta["content"], end='')
+            if not console_slience: print(delta["content"], end='')
            if observe_window is not None:
                # 观测窗，把已经获取的数据显示出去
                if len(observe_window) >= 1:
@@ -191,10 +202,13 @@ def predict_no_ui_long_connection(inputs:str, llm_kwargs:dict, history:list=[],
                    if (time.time()-observe_window[1]) > watch_dog_patience:
                        raise RuntimeError("用户取消了程序。")
        else: raise RuntimeError("意外Json结构："+delta)
-    if json_data and json_data['finish_reason'] == 'content_filter':
+
-        raise RuntimeError("由于提问含不合规内容被Azure过滤。")
+    finish_reason = json_data.get('finish_reason', None) if json_data else None
-    if json_data and json_data['finish_reason'] == 'length':
+    if finish_reason == 'content_filter':
        raise RuntimeError("由于提问含不合规内容被过滤。")
    if finish_reason == 'length':
        raise ConnectionAbortedError("正常结束，但显示Token不足，导致输出不完整，请削减单次输入的文本量。")
    return result
@@ -209,7 +223,7 @@ def predict(inputs:str, llm_kwargs:dict, plugin_kwargs:dict, chatbot:ChatBotWith
    chatbot 为WebUI中显示的对话列表，修改它，然后yeild出去，可以直接修改对话界面内容
    additional_fn代表点击的哪个按钮，按钮见functional.py
    """
-    from .bridge_all import model_info
+    from request_llms.bridge_all import model_info
    if is_any_api_key(inputs):
        chatbot._cookies['api_key'] = inputs
        chatbot.append(("输入已识别为openai的api_key", what_keys(inputs)))
@@ -238,6 +252,10 @@ def predict(inputs:str, llm_kwargs:dict, plugin_kwargs:dict, chatbot:ChatBotWith
    chatbot.append((_inputs, ""))
    yield from update_ui(chatbot=chatbot, history=history, msg="等待响应") # 刷新界面
    # 禁用stream的特殊模型处理
    if model_info[llm_kwargs['llm_model']].get('openai_disable_stream', False): stream = False
    else: stream = True
    # check mis-behavior
    if is_the_upload_folder(user_input):
        chatbot[-1] = (inputs, f"[Local Message] 检测到操作错误！当您上传文档之后，需点击“**函数插件区**”按钮进行处理，请勿点击“提交”按钮或者“基础功能区”按钮。")
@@ -271,7 +289,7 @@ def predict(inputs:str, llm_kwargs:dict, plugin_kwargs:dict, chatbot:ChatBotWith
        try:
            # make a POST request to the API endpoint, stream=True
            response = requests.post(endpoint, headers=headers, proxies=proxies,
-                                    json=payload, stream=True, timeout=TIMEOUT_SECONDS);break
+                                    json=payload, stream=stream, timeout=TIMEOUT_SECONDS);break
        except:
            retry += 1
            chatbot[-1] = ((chatbot[-1][0], timeout_bot_msg))
@@ -279,10 +297,15 @@ def predict(inputs:str, llm_kwargs:dict, plugin_kwargs:dict, chatbot:ChatBotWith
            yield from update_ui(chatbot=chatbot, history=history, msg="请求超时"+retry_msg) # 刷新界面
            if retry > MAX_RETRY: raise TimeoutError
    gpt_replying_buffer = ""
-    is_head_of_the_stream = True
+    if not stream:
        # 该分支仅适用于不支持stream的o1模型，其他情形一律不适用
        yield from handle_o1_model_special(response, inputs, llm_kwargs, chatbot, history)
        return
    if stream:
        gpt_replying_buffer = ""
        is_head_of_the_stream = True
        stream_response =  response.iter_lines()
        while True:
            try:
@@ -343,12 +366,24 @@ def predict(inputs:str, llm_kwargs:dict, plugin_kwargs:dict, chatbot:ChatBotWith
                    chunk_decoded = chunk.decode()
                    error_msg = chunk_decoded
                    chatbot, history = handle_error(inputs, llm_kwargs, chatbot, history, chunk_decoded, error_msg)
-                    yield from update_ui(chatbot=chatbot, history=history, msg="Json异常" + error_msg) # 刷新界面
+                    yield from update_ui(chatbot=chatbot, history=history, msg="Json解析异常" + error_msg) # 刷新界面
                    logger.error(error_msg)
                    return
        return  # return from stream-branch
 def handle_o1_model_special(response, inputs, llm_kwargs, chatbot, history):
    try:
        chunkjson = json.loads(response.content.decode())
        gpt_replying_buffer = chunkjson['choices'][0]["message"]["content"]
        log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer)
        history[-1] = gpt_replying_buffer
        chatbot[-1] = (history[-2], history[-1])
        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
    except Exception as e:
        yield from update_ui(chatbot=chatbot, history=history, msg="Json解析异常" + response.text) # 刷新界面
 def handle_error(inputs, llm_kwargs, chatbot, history, chunk_decoded, error_msg):
-    from .bridge_all import model_info
+    from request_llms.bridge_all import model_info
    openai_website = ' 请登录OpenAI查看详情 https://platform.openai.com/signup'
    if "reduce the length" in error_msg:
        if len(history) >= 2: history[-1] = ""; history[-2] = "" # 清除当前溢出的输入：history[-2] 是本次输入, history[-1] 是本次输出
@@ -381,6 +416,8 @@ def generate_payload(inputs:str, llm_kwargs:dict, history:list, system_prompt:st
    """
    整合所有信息，选择LLM模型，生成http请求，为发送请求做准备
    """
    from request_llms.bridge_all import model_info
    if not is_any_api_key(llm_kwargs['api_key']):
        raise AssertionError("你提供了错误的API_KEY。\n\n1. 临时解决方案：直接在输入区键入api_key，然后回车提交。\n\n2. 长效解决方案：在config.py中配置。")
@@ -409,10 +446,16 @@ def generate_payload(inputs:str, llm_kwargs:dict, history:list, system_prompt:st
    else:
        enable_multimodal_capacity = False
    conversation_cnt = len(history) // 2
    openai_disable_system_prompt = model_info[llm_kwargs['llm_model']].get('openai_disable_system_prompt', False)
    if openai_disable_system_prompt:
        messages = [{"role": "user", "content": system_prompt}]
    else:
        messages = [{"role": "system", "content": system_prompt}]
    if not enable_multimodal_capacity:
        # 不使用多模态能力
        conversation_cnt = len(history) // 2
        messages = [{"role": "system", "content": system_prompt}]
        if conversation_cnt:
            for index in range(0, 2*conversation_cnt, 2):
                what_i_have_asked = {}
@@ -434,8 +477,6 @@ def generate_payload(inputs:str, llm_kwargs:dict, history:list, system_prompt:st
        messages.append(what_i_ask_now)
    else:
        # 多模态能力
        conversation_cnt = len(history) // 2
        messages = [{"role": "system", "content": system_prompt}]
        if conversation_cnt:
            for index in range(0, 2*conversation_cnt, 2):
                what_i_have_asked = {}
@@ -498,4 +539,3 @@ def generate_payload(inputs:str, llm_kwargs:dict, history:list, system_prompt:st
    return headers,payload
--- a/request_llms/bridge_cohere.py
+++ b/request_llms/bridge_cohere.py
@@ -111,7 +111,7 @@ def predict_no_ui_long_connection(inputs:str, llm_kwargs:dict, history:list=[],
        if chunkjson['event_type'] == 'stream-start': continue
        if chunkjson['event_type'] == 'text-generation':
            result += chunkjson["text"]
-            if not console_slience: logger.info(chunkjson["text"], end='')
+            if not console_slience: print(chunkjson["text"], end='')
            if observe_window is not None:
                # 观测窗，把已经获取的数据显示出去
                if len(observe_window) >= 1:
--- a/request_llms/bridge_ollama.py
+++ b/request_llms/bridge_ollama.py
@@ -99,7 +99,7 @@ def predict_no_ui_long_connection(inputs, llm_kwargs, history=[], sys_prompt="",
                    logger.info(f'[response] {result}')
                    break
                result += chunkjson['message']["content"]
-                if not console_slience: logger.info(chunkjson['message']["content"], end='')
+                if not console_slience: print(chunkjson['message']["content"], end='')
                if observe_window is not None:
                    # 观测窗，把已经获取的数据显示出去
                    if len(observe_window) >= 1:
--- a/request_llms/bridge_openrouter.py
+++ b/request_llms/bridge_openrouter.py
@@ -0,0 +1,541 @@
 """
    该文件中主要包含三个函数
    不具备多线程能力的函数：
    1. predict: 正常对话时使用，具备完备的交互功能，不可多线程
    具备多线程调用能力的函数
    2. predict_no_ui_long_connection：支持多线程
 """
 import json
 import os
 import re
 import time
 import traceback
 import requests
 import random
 from loguru import logger
 # config_private.py放自己的秘密如API和代理网址
 # 读取时首先看是否存在私密的config_private配置文件（不受git管控），如果有，则覆盖原config文件
 from toolbox import get_conf, update_ui, is_any_api_key, select_api_key, what_keys, clip_history
 from toolbox import trimmed_format_exc, is_the_upload_folder, read_one_api_model_name, log_chat
 from toolbox import ChatBotWithCookies, have_any_recent_upload_image_files, encode_image
 proxies, TIMEOUT_SECONDS, MAX_RETRY, API_ORG, AZURE_CFG_ARRAY = \
    get_conf('proxies', 'TIMEOUT_SECONDS', 'MAX_RETRY', 'API_ORG', 'AZURE_CFG_ARRAY')
 timeout_bot_msg = '[Local Message] Request timeout. Network error. Please check proxy settings in config.py.' + \
                  '网络错误，检查代理服务器是否可用，以及代理设置的格式是否正确，格式须是[协议]://[地址]:[端口]，缺一不可。'
 def get_full_error(chunk, stream_response):
    """
        获取完整的从Openai返回的报错
    """
    while True:
        try:
            chunk += next(stream_response)
        except:
            break
    return chunk
 def make_multimodal_input(inputs, image_paths):
    image_base64_array = []
    for image_path in image_paths:
        path = os.path.abspath(image_path)
        base64 = encode_image(path)
        inputs = inputs + f'<br/><br/><div align="center"><img src="file={path}" base64="{base64}"></div>'
        image_base64_array.append(base64)
    return inputs, image_base64_array
 def reverse_base64_from_input(inputs):
    # 定义一个正则表达式来匹配 Base64 字符串（假设格式为 base64="<Base64编码>"）
    # pattern = re.compile(r'base64="([^"]+)"></div>')
    pattern = re.compile(r'<br/><br/><div align="center"><img[^<>]+base64="([^"]+)"></div>')
    # 使用 findall 方法查找所有匹配的 Base64 字符串
    base64_strings = pattern.findall(inputs)
    # 返回反转后的 Base64 字符串列表
    return base64_strings
 def contain_base64(inputs):
    base64_strings = reverse_base64_from_input(inputs)
    return len(base64_strings) > 0
 def append_image_if_contain_base64(inputs):
    if not contain_base64(inputs):
        return inputs
    else:
        image_base64_array = reverse_base64_from_input(inputs)
        pattern = re.compile(r'<br/><br/><div align="center"><img[^><]+></div>')
        inputs = re.sub(pattern, '', inputs)
        res = []
        res.append({
            "type": "text",
            "text": inputs
        })
        for image_base64 in image_base64_array:
            res.append({
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{image_base64}"
                }
            })
        return res
 def remove_image_if_contain_base64(inputs):
    if not contain_base64(inputs):
        return inputs
    else:
        pattern = re.compile(r'<br/><br/><div align="center"><img[^><]+></div>')
        inputs = re.sub(pattern, '', inputs)
        return inputs
 def decode_chunk(chunk):
    # 提前读取一些信息 （用于判断异常）
    chunk_decoded = chunk.decode()
    chunkjson = None
    has_choices = False
    choice_valid = False
    has_content = False
    has_role = False
    try:
        chunkjson = json.loads(chunk_decoded[6:])
        has_choices = 'choices' in chunkjson
        if has_choices: choice_valid = (len(chunkjson['choices']) > 0)
        if has_choices and choice_valid: has_content = ("content" in chunkjson['choices'][0]["delta"])
        if has_content: has_content = (chunkjson['choices'][0]["delta"]["content"] is not None)
        if has_choices and choice_valid: has_role = "role" in chunkjson['choices'][0]["delta"]
    except:
        pass
    return chunk_decoded, chunkjson, has_choices, choice_valid, has_content, has_role
 from functools import lru_cache
@lru_cache(maxsize=32)
 def verify_endpoint(endpoint):
    """
        检查endpoint是否可用
    """
    if "你亲手写的api名称" in endpoint:
        raise ValueError("Endpoint不正确, 请检查AZURE_ENDPOINT的配置! 当前的Endpoint为:" + endpoint)
    return endpoint
 def predict_no_ui_long_connection(inputs:str, llm_kwargs:dict, history:list=[], sys_prompt:str="", observe_window:list=None, console_slience:bool=False):
    """
    发送至chatGPT，等待回复，一次性完成，不显示中间过程。但内部用stream的方法避免中途网线被掐。
    inputs：
        是本次问询的输入
    sys_prompt:
        系统静默prompt
    llm_kwargs：
        chatGPT的内部调优参数
    history：
        是之前的对话列表
    observe_window = None：
        用于负责跨越线程传递已经输出的部分，大部分时候仅仅为了fancy的视觉效果，留空即可。observe_window[0]：观测窗。observe_window[1]：看门狗
    """
    from request_llms.bridge_all import model_info
    watch_dog_patience = 5 # 看门狗的耐心, 设置5秒即可
    if model_info[llm_kwargs['llm_model']].get('openai_disable_stream', False): stream = False
    else: stream = True
    headers, payload = generate_payload(inputs, llm_kwargs, history, system_prompt=sys_prompt, stream=stream)
    retry = 0
    while True:
        try:
            # make a POST request to the API endpoint, stream=False
            endpoint = verify_endpoint(model_info[llm_kwargs['llm_model']]['endpoint'])
            response = requests.post(endpoint, headers=headers, proxies=proxies,
                                    json=payload, stream=stream, timeout=TIMEOUT_SECONDS); break
        except requests.exceptions.ReadTimeout as e:
            retry += 1
            traceback.print_exc()
            if retry > MAX_RETRY: raise TimeoutError
            if MAX_RETRY!=0: logger.error(f'请求超时，正在重试 ({retry}/{MAX_RETRY}) ……')
    if not stream:
        # 该分支仅适用于不支持stream的o1模型，其他情形一律不适用
        chunkjson = json.loads(response.content.decode())
        gpt_replying_buffer = chunkjson['choices'][0]["message"]["content"]
        return gpt_replying_buffer
    stream_response = response.iter_lines()
    result = ''
    json_data = None
    while True:
        try: chunk = next(stream_response)
        except StopIteration:
            break
        except requests.exceptions.ConnectionError:
            chunk = next(stream_response) # 失败了，重试一次？再失败就没办法了。
        chunk_decoded, chunkjson, has_choices, choice_valid, has_content, has_role = decode_chunk(chunk)
        if len(chunk_decoded)==0: continue
        if not chunk_decoded.startswith('data:'):
            error_msg = get_full_error(chunk, stream_response).decode()
            if "reduce the length" in error_msg:
                raise ConnectionAbortedError("OpenAI拒绝了请求:" + error_msg)
            elif """type":"upstream_error","param":"307""" in error_msg:
                raise ConnectionAbortedError("正常结束，但显示Token不足，导致输出不完整，请削减单次输入的文本量。")
            else:
                raise RuntimeError("OpenAI拒绝了请求：" + error_msg)
        if ('data: [DONE]' in chunk_decoded): break # api2d 正常完成
        # 提前读取一些信息 （用于判断异常）
        if (has_choices and not choice_valid) or ('OPENROUTER PROCESSING' in chunk_decoded):
            # 一些垃圾第三方接口的出现这样的错误，openrouter的特殊处理
            continue
        json_data = chunkjson['choices'][0]
        delta = json_data["delta"]
        if len(delta) == 0: break
        if (not has_content) and has_role: continue
        if (not has_content) and (not has_role): continue # raise RuntimeError("发现不标准的第三方接口："+delta)
        if has_content: # has_role = True/False
            result += delta["content"]
            if not console_slience: print(delta["content"], end='')
            if observe_window is not None:
                # 观测窗，把已经获取的数据显示出去
                if len(observe_window) >= 1:
                    observe_window[0] += delta["content"]
                # 看门狗，如果超过期限没有喂狗，则终止
                if len(observe_window) >= 2:
                    if (time.time()-observe_window[1]) > watch_dog_patience:
                        raise RuntimeError("用户取消了程序。")
        else: raise RuntimeError("意外Json结构："+delta)
    if json_data and json_data['finish_reason'] == 'content_filter':
        raise RuntimeError("由于提问含不合规内容被Azure过滤。")
    if json_data and json_data['finish_reason'] == 'length':
        raise ConnectionAbortedError("正常结束，但显示Token不足，导致输出不完整，请削减单次输入的文本量。")
    return result
 def predict(inputs:str, llm_kwargs:dict, plugin_kwargs:dict, chatbot:ChatBotWithCookies,
            history:list=[], system_prompt:str='', stream:bool=True, additional_fn:str=None):
    """
    发送至chatGPT，流式获取输出。
    用于基础的对话功能。
    inputs 是本次问询的输入
    top_p, temperature是chatGPT的内部调优参数
    history 是之前的对话列表（注意无论是inputs还是history，内容太长了都会触发token数量溢出的错误）
    chatbot 为WebUI中显示的对话列表，修改它，然后yeild出去，可以直接修改对话界面内容
    additional_fn代表点击的哪个按钮，按钮见functional.py
    """
    from request_llms.bridge_all import model_info
    if is_any_api_key(inputs):
        chatbot._cookies['api_key'] = inputs
        chatbot.append(("输入已识别为openai的api_key", what_keys(inputs)))
        yield from update_ui(chatbot=chatbot, history=history, msg="api_key已导入") # 刷新界面
        return
    elif not is_any_api_key(chatbot._cookies['api_key']):
        chatbot.append((inputs, "缺少api_key。\n\n1. 临时解决方案：直接在输入区键入api_key，然后回车提交。\n\n2. 长效解决方案：在config.py中配置。"))
        yield from update_ui(chatbot=chatbot, history=history, msg="缺少api_key") # 刷新界面
        return
    user_input = inputs
    if additional_fn is not None:
        from core_functional import handle_core_functionality
        inputs, history = handle_core_functionality(additional_fn, inputs, history, chatbot)
    # 多模态模型
    has_multimodal_capacity = model_info[llm_kwargs['llm_model']].get('has_multimodal_capacity', False)
    if has_multimodal_capacity:
        has_recent_image_upload, image_paths = have_any_recent_upload_image_files(chatbot, pop=True)
    else:
        has_recent_image_upload, image_paths = False, []
    if has_recent_image_upload:
        _inputs, image_base64_array = make_multimodal_input(inputs, image_paths)
    else:
        _inputs, image_base64_array = inputs, []
    chatbot.append((_inputs, ""))
    yield from update_ui(chatbot=chatbot, history=history, msg="等待响应") # 刷新界面
    # 禁用stream的特殊模型处理
    if model_info[llm_kwargs['llm_model']].get('openai_disable_stream', False): stream = False
    else: stream = True
    # check mis-behavior
    if is_the_upload_folder(user_input):
        chatbot[-1] = (inputs, f"[Local Message] 检测到操作错误！当您上传文档之后，需点击“**函数插件区**”按钮进行处理，请勿点击“提交”按钮或者“基础功能区”按钮。")
        yield from update_ui(chatbot=chatbot, history=history, msg="正常") # 刷新界面
        time.sleep(2)
    try:
        headers, payload = generate_payload(inputs, llm_kwargs, history, system_prompt, image_base64_array, has_multimodal_capacity, stream)
    except RuntimeError as e:
        chatbot[-1] = (inputs, f"您提供的api-key不满足要求，不包含任何可用于{llm_kwargs['llm_model']}的api-key。您可能选择了错误的模型或请求源。")
        yield from update_ui(chatbot=chatbot, history=history, msg="api-key不满足要求") # 刷新界面
        return
    # 检查endpoint是否合法
    try:
        endpoint = verify_endpoint(model_info[llm_kwargs['llm_model']]['endpoint'])
    except:
        tb_str = '```\n' + trimmed_format_exc() + '```'
        chatbot[-1] = (inputs, tb_str)
        yield from update_ui(chatbot=chatbot, history=history, msg="Endpoint不满足要求") # 刷新界面
        return
    # 加入历史
    if has_recent_image_upload:
        history.extend([_inputs, ""])
    else:
        history.extend([inputs, ""])
    retry = 0
    while True:
        try:
            # make a POST request to the API endpoint, stream=True
            response = requests.post(endpoint, headers=headers, proxies=proxies,
                                    json=payload, stream=stream, timeout=TIMEOUT_SECONDS);break
        except:
            retry += 1
            chatbot[-1] = ((chatbot[-1][0], timeout_bot_msg))
            retry_msg = f"，正在重试 ({retry}/{MAX_RETRY}) ……" if MAX_RETRY > 0 else ""
            yield from update_ui(chatbot=chatbot, history=history, msg="请求超时"+retry_msg) # 刷新界面
            if retry > MAX_RETRY: raise TimeoutError
    if not stream:
        # 该分支仅适用于不支持stream的o1模型，其他情形一律不适用
        yield from handle_o1_model_special(response, inputs, llm_kwargs, chatbot, history)
        return
    if stream:
        gpt_replying_buffer = ""
        is_head_of_the_stream = True
        stream_response =  response.iter_lines()
        while True:
            try:
                chunk = next(stream_response)
            except StopIteration:
                # 非OpenAI官方接口的出现这样的报错，OpenAI和API2D不会走这里
                chunk_decoded = chunk.decode()
                error_msg = chunk_decoded
                # 首先排除一个one-api没有done数据包的第三方Bug情形
                if len(gpt_replying_buffer.strip()) > 0 and len(error_msg) == 0:
                    yield from update_ui(chatbot=chatbot, history=history, msg="检测到有缺陷的非OpenAI官方接口，建议选择更稳定的接口。")
                    break
                # 其他情况，直接返回报错
                chatbot, history = handle_error(inputs, llm_kwargs, chatbot, history, chunk_decoded, error_msg)
                yield from update_ui(chatbot=chatbot, history=history, msg="非OpenAI官方接口返回了错误:" + chunk.decode()) # 刷新界面
                return
            # 提前读取一些信息 （用于判断异常）
            chunk_decoded, chunkjson, has_choices, choice_valid, has_content, has_role = decode_chunk(chunk)
            if is_head_of_the_stream and (r'"object":"error"' not in chunk_decoded) and (r"content" not in chunk_decoded):
                # 数据流的第一帧不携带content
                is_head_of_the_stream = False; continue
            if chunk:
                try:
                    if (has_choices and not choice_valid) or ('OPENROUTER PROCESSING' in chunk_decoded):
                        # 一些垃圾第三方接口的出现这样的错误, 或者OPENROUTER的特殊处理,因为OPENROUTER的数据流未连接到模型时会出现OPENROUTER PROCESSING
                        continue
                    if ('data: [DONE]' not in chunk_decoded) and len(chunk_decoded) > 0 and (chunkjson is None):
                        # 传递进来一些奇怪的东西
                        raise ValueError(f'无法读取以下数据，请检查配置。\n\n{chunk_decoded}')
                    # 前者是API2D的结束条件，后者是OPENAI的结束条件
                    if ('data: [DONE]' in chunk_decoded) or (len(chunkjson['choices'][0]["delta"]) == 0):
                        # 判定为数据流的结束，gpt_replying_buffer也写完了
                        log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer)
                        break
                    # 处理数据流的主体
                    status_text = f"finish_reason: {chunkjson['choices'][0].get('finish_reason', 'null')}"
                    # 如果这里抛出异常，一般是文本过长，详情见get_full_error的输出
                    if has_content:
                        # 正常情况
                        gpt_replying_buffer = gpt_replying_buffer + chunkjson['choices'][0]["delta"]["content"]
                    elif has_role:
                        # 一些第三方接口的出现这样的错误，兼容一下吧
                        continue
                    else:
                        # 至此已经超出了正常接口应该进入的范围，一些垃圾第三方接口会出现这样的错误
                        if chunkjson['choices'][0]["delta"]["content"] is None: continue # 一些垃圾第三方接口出现这样的错误，兼容一下吧
                        gpt_replying_buffer = gpt_replying_buffer + chunkjson['choices'][0]["delta"]["content"]
                    history[-1] = gpt_replying_buffer
                    chatbot[-1] = (history[-2], history[-1])
                    yield from update_ui(chatbot=chatbot, history=history, msg=status_text) # 刷新界面
                except Exception as e:
                    yield from update_ui(chatbot=chatbot, history=history, msg="Json解析不合常规") # 刷新界面
                    chunk = get_full_error(chunk, stream_response)
                    chunk_decoded = chunk.decode()
                    error_msg = chunk_decoded
                    chatbot, history = handle_error(inputs, llm_kwargs, chatbot, history, chunk_decoded, error_msg)
                    yield from update_ui(chatbot=chatbot, history=history, msg="Json解析异常" + error_msg) # 刷新界面
                    logger.error(error_msg)
                    return
        return  # return from stream-branch
 def handle_o1_model_special(response, inputs, llm_kwargs, chatbot, history):
    try:
        chunkjson = json.loads(response.content.decode())
        gpt_replying_buffer = chunkjson['choices'][0]["message"]["content"]
        log_chat(llm_model=llm_kwargs["llm_model"], input_str=inputs, output_str=gpt_replying_buffer)
        history[-1] = gpt_replying_buffer
        chatbot[-1] = (history[-2], history[-1])
        yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
    except Exception as e:
        yield from update_ui(chatbot=chatbot, history=history, msg="Json解析异常" + response.text) # 刷新界面
 def handle_error(inputs, llm_kwargs, chatbot, history, chunk_decoded, error_msg):
    from request_llms.bridge_all import model_info
    openai_website = ' 请登录OpenAI查看详情 https://platform.openai.com/signup'
    if "reduce the length" in error_msg:
        if len(history) >= 2: history[-1] = ""; history[-2] = "" # 清除当前溢出的输入：history[-2] 是本次输入, history[-1] 是本次输出
        history = clip_history(inputs=inputs, history=history, tokenizer=model_info[llm_kwargs['llm_model']]['tokenizer'],
                                               max_token_limit=(model_info[llm_kwargs['llm_model']]['max_token'])) # history至少释放二分之一
        chatbot[-1] = (chatbot[-1][0], "[Local Message] Reduce the length. 本次输入过长, 或历史数据过长. 历史缓存数据已部分释放, 您可以请再次尝试. (若再次失败则更可能是因为输入过长.)")
    elif "does not exist" in error_msg:
        chatbot[-1] = (chatbot[-1][0], f"[Local Message] Model {llm_kwargs['llm_model']} does not exist. 模型不存在, 或者您没有获得体验资格.")
    elif "Incorrect API key" in error_msg:
        chatbot[-1] = (chatbot[-1][0], "[Local Message] Incorrect API key. OpenAI以提供了不正确的API_KEY为由, 拒绝服务. " + openai_website)
    elif "exceeded your current quota" in error_msg:
        chatbot[-1] = (chatbot[-1][0], "[Local Message] You exceeded your current quota. OpenAI以账户额度不足为由, 拒绝服务." + openai_website)
    elif "account is not active" in error_msg:
        chatbot[-1] = (chatbot[-1][0], "[Local Message] Your account is not active. OpenAI以账户失效为由, 拒绝服务." + openai_website)
    elif "associated with a deactivated account" in error_msg:
        chatbot[-1] = (chatbot[-1][0], "[Local Message] You are associated with a deactivated account. OpenAI以账户失效为由, 拒绝服务." + openai_website)
    elif "API key has been deactivated" in error_msg:
        chatbot[-1] = (chatbot[-1][0], "[Local Message] API key has been deactivated. OpenAI以账户失效为由, 拒绝服务." + openai_website)
    elif "bad forward key" in error_msg:
        chatbot[-1] = (chatbot[-1][0], "[Local Message] Bad forward key. API2D账户额度不足.")
    elif "Not enough point" in error_msg:
        chatbot[-1] = (chatbot[-1][0], "[Local Message] Not enough point. API2D账户点数不足.")
    else:
        from toolbox import regular_txt_to_markdown
        tb_str = '```\n' + trimmed_format_exc() + '```'
        chatbot[-1] = (chatbot[-1][0], f"[Local Message] 异常 \n\n{tb_str} \n\n{regular_txt_to_markdown(chunk_decoded)}")
    return chatbot, history
 def generate_payload(inputs:str, llm_kwargs:dict, history:list, system_prompt:str, image_base64_array:list=[], has_multimodal_capacity:bool=False, stream:bool=True):
    """
    整合所有信息，选择LLM模型，生成http请求，为发送请求做准备
    """
    from request_llms.bridge_all import model_info
    if not is_any_api_key(llm_kwargs['api_key']):
        raise AssertionError("你提供了错误的API_KEY。\n\n1. 临时解决方案：直接在输入区键入api_key，然后回车提交。\n\n2. 长效解决方案：在config.py中配置。")
    if llm_kwargs['llm_model'].startswith('vllm-'):
        api_key = 'no-api-key'
    else:
        api_key = select_api_key(llm_kwargs['api_key'], llm_kwargs['llm_model'])
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {api_key}"
    }
    if API_ORG.startswith('org-'): headers.update({"OpenAI-Organization": API_ORG})
    if llm_kwargs['llm_model'].startswith('azure-'):
        headers.update({"api-key": api_key})
        if llm_kwargs['llm_model'] in AZURE_CFG_ARRAY.keys():
            azure_api_key_unshared = AZURE_CFG_ARRAY[llm_kwargs['llm_model']]["AZURE_API_KEY"]
            headers.update({"api-key": azure_api_key_unshared})
    if has_multimodal_capacity:
        # 当以下条件满足时，启用多模态能力：
        # 1. 模型本身是多模态模型（has_multimodal_capacity）
        # 2. 输入包含图像（len(image_base64_array) > 0）
        # 3. 历史输入包含图像（ any([contain_base64(h) for h in history]) ）
        enable_multimodal_capacity = (len(image_base64_array) > 0) or any([contain_base64(h) for h in history])
    else:
        enable_multimodal_capacity = False
    conversation_cnt = len(history) // 2
    openai_disable_system_prompt = model_info[llm_kwargs['llm_model']].get('openai_disable_system_prompt', False)
    if openai_disable_system_prompt:
        messages = [{"role": "user", "content": system_prompt}]
    else:
        messages = [{"role": "system", "content": system_prompt}]
    if not enable_multimodal_capacity:
        # 不使用多模态能力
        if conversation_cnt:
            for index in range(0, 2*conversation_cnt, 2):
                what_i_have_asked = {}
                what_i_have_asked["role"] = "user"
                what_i_have_asked["content"] = remove_image_if_contain_base64(history[index])
                what_gpt_answer = {}
                what_gpt_answer["role"] = "assistant"
                what_gpt_answer["content"] = remove_image_if_contain_base64(history[index+1])
                if what_i_have_asked["content"] != "":
                    if what_gpt_answer["content"] == "": continue
                    if what_gpt_answer["content"] == timeout_bot_msg: continue
                    messages.append(what_i_have_asked)
                    messages.append(what_gpt_answer)
                else:
                    messages[-1]['content'] = what_gpt_answer['content']
        what_i_ask_now = {}
        what_i_ask_now["role"] = "user"
        what_i_ask_now["content"] = inputs
        messages.append(what_i_ask_now)
    else:
        # 多模态能力
        if conversation_cnt:
            for index in range(0, 2*conversation_cnt, 2):
                what_i_have_asked = {}
                what_i_have_asked["role"] = "user"
                what_i_have_asked["content"] = append_image_if_contain_base64(history[index])
                what_gpt_answer = {}
                what_gpt_answer["role"] = "assistant"
                what_gpt_answer["content"] = append_image_if_contain_base64(history[index+1])
                if what_i_have_asked["content"] != "":
                    if what_gpt_answer["content"] == "": continue
                    if what_gpt_answer["content"] == timeout_bot_msg: continue
                    messages.append(what_i_have_asked)
                    messages.append(what_gpt_answer)
                else:
                    messages[-1]['content'] = what_gpt_answer['content']
        what_i_ask_now = {}
        what_i_ask_now["role"] = "user"
        what_i_ask_now["content"] = []
        what_i_ask_now["content"].append({
            "type": "text",
            "text": inputs
        })
        for image_base64 in image_base64_array:
            what_i_ask_now["content"].append({
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{image_base64}"
                }
            })
        messages.append(what_i_ask_now)
    model = llm_kwargs['llm_model']
    if llm_kwargs['llm_model'].startswith('api2d-'):
        model = llm_kwargs['llm_model'][len('api2d-'):]
    if llm_kwargs['llm_model'].startswith('one-api-'):
        model = llm_kwargs['llm_model'][len('one-api-'):]
        model, _ = read_one_api_model_name(model)
    if llm_kwargs['llm_model'].startswith('vllm-'):
        model = llm_kwargs['llm_model'][len('vllm-'):]
        model, _ = read_one_api_model_name(model)
    if llm_kwargs['llm_model'].startswith('openrouter-'):
        model = llm_kwargs['llm_model'][len('openrouter-'):]
        model= read_one_api_model_name(model)
    if model == "gpt-3.5-random": # 随机选择, 绕过openai访问频率限制
        model = random.choice([
            "gpt-3.5-turbo",
            "gpt-3.5-turbo-16k",
            "gpt-3.5-turbo-1106",
            "gpt-3.5-turbo-0613",
            "gpt-3.5-turbo-16k-0613",
            "gpt-3.5-turbo-0301",
        ])
    payload = {
        "model": model,
        "messages": messages,
        "temperature": llm_kwargs['temperature'],  # 1.0,
        "top_p": llm_kwargs['top_p'],  # 1.0,
        "n": 1,
        "stream": stream,
    }
    return headers,payload
--- a/request_llms/oai_std_model_template.py
+++ b/request_llms/oai_std_model_template.py
@@ -224,7 +224,7 @@ def get_predict_function(
                try:
                    if finish_reason == "stop":
                        if not console_slience:
-                            logger.info(f"[response] {result}")
+                            print(f"[response] {result}")
                        break
                    result += response_text
                    if observe_window is not None:
--- a/requirements.txt
+++ b/requirements.txt
@@ -2,14 +2,15 @@ https://public.agent-matrix.com/publish/gradio-3.32.10-py3-none-any.whl
 fastapi==0.110
 gradio-client==0.8
 pypdf2==2.12.1
 httpx<=0.25.2
 zhipuai==2.0.1
 tiktoken>=0.3.3
 requests[socks]
-pydantic==2.5.2
+pydantic==2.9.2
 llama-index==0.10
 protobuf==3.20
 transformers>=4.27.1,<4.42
 scipdf_parser>=0.52
 spacy==3.7.4
 anthropic>=0.18.1
 python-markdown-math
 pymdown-extensions
@@ -32,3 +33,14 @@ loguru
 arxiv
 numpy
 rich
 llama-index-core==0.10.68
 llama-index-legacy==0.9.48
 llama-index-readers-file==0.1.33
 llama-index-readers-llama-parse==0.1.6
 llama-index-embeddings-azure-openai==0.1.10
 llama-index-embeddings-openai==0.1.10
 llama-parse==0.4.9
 mdit-py-plugins>=0.3.3
 linkify-it-py==2.0.3
--- a/shared_utils/config_loader.py
+++ b/shared_utils/config_loader.py
@@ -94,7 +94,7 @@ def read_single_conf_with_lru_cache(arg):
        if r is None:
            log亮红('[PROXY] 网络代理状态：未配置。无代理状态下很可能无法访问OpenAI家族的模型。建议：检查USE_PROXY选项是否修改。')
        else:
-            log亮绿('[PROXY] 网络代理状态：已配置。配置信息如下：', r)
+            log亮绿('[PROXY] 网络代理状态：已配置。配置信息如下：', str(r))
            assert isinstance(r, dict), 'proxies格式错误，请注意proxies选项的格式，不要遗漏括号。'
    return r
--- a/shared_utils/cookie_manager.py
+++ b/shared_utils/cookie_manager.py
@@ -90,23 +90,6 @@ def make_history_cache():
 # """
 # with gr.Row():
 #     txt = gr.Textbox(show_label=False, placeholder="Input question here.", elem_id='user_input_main').style(container=False)
 #     txtx = gr.Textbox(show_label=False, placeholder="Input question here.", elem_id='user_input_main').style(container=False)
 # with gr.Row():
 #     btn_value = "Test"
 #     elem_id = "TestCase"
 #     variant = "primary"
 #     input_list = [txt, txtx]
 #     output_list = [txt, txtx]
 #     input_name_list = ["txt(input)", "txtx(input)"]
 #     output_name_list = ["txt", "txtx"]
 #     js_callback = """(txt, txtx)=>{console.log(txt); console.log(txtx);}"""
 #     def function(txt, txtx):
 #         return "booo", "goooo"
 #     create_button_with_javascript_callback(btn_value, elem_id, variant, js_callback, input_list, output_list, function, input_name_list, output_name_list)
 # """
 def create_button_with_javascript_callback(btn_value, elem_id, variant, js_callback, input_list, output_list, function, input_name_list, output_name_list):
    import gradio as gr
    middle_ware_component = gr.Textbox(visible=False, elem_id=elem_id+'_buffer')
--- a/shared_utils/key_pattern_manager.py
+++ b/shared_utils/key_pattern_manager.py
@@ -34,6 +34,9 @@ def is_api2d_key(key):
    API_MATCH_API2D = re.match(r"fk[a-zA-Z0-9]{6}-[a-zA-Z0-9]{32}$", key)
    return bool(API_MATCH_API2D)
 def is_openroute_api_key(key):
    API_MATCH_OPENROUTE = re.match(r"sk-or-v1-[a-zA-Z0-9]{64}$", key)
    return bool(API_MATCH_OPENROUTE)
 def is_cohere_api_key(key):
    API_MATCH_AZURE = re.match(r"[a-zA-Z0-9]{40}$", key)
@@ -89,6 +92,10 @@ def select_api_key(keys, llm_model):
    if llm_model.startswith('cohere-'):
        for k in key_list:
            if is_cohere_api_key(k): avail_key_list.append(k)
    if llm_model.startswith('openrouter-'):
        for k in key_list:
            if is_openroute_api_key(k): avail_key_list.append(k)
    if len(avail_key_list) == 0:
        raise RuntimeError(f"您提供的api-key不满足要求，不包含任何可用于{llm_model}的api-key。您可能选择了错误的模型或请求源（左上角更换模型菜单中可切换openai,azure,claude,cohere等请求源）。")
--- a/shared_utils/logging.py
+++ b/shared_utils/logging.py
@@ -11,7 +11,7 @@ def not_chat_log_filter(record):
 def formatter_with_clip(record):
    # Note this function returns the string to be formatted, not the actual message to be logged
-    record["extra"]["serialized"] = "555555"
+    # record["extra"]["serialized"] = "555555"
    max_len = 12
    record['function_x'] = record['function'].center(max_len)
    if len(record['function_x']) > max_len:
--- a/tests/test_anim_gen.py
+++ b/tests/test_anim_gen.py
@@ -0,0 +1,12 @@
 """
 对项目中的各个插件进行测试。运行方法：直接运行 python tests/test_plugins.py
 """
 import init_test
 import os, sys
 if __name__ == "__main__":
    from test_utils import plugin_test
    plugin_test(plugin='crazy_functions.数学动画生成manim->动画生成', main_input="A point moving along function culve y=sin(x), starting from x=0 and stop at x=4*\pi.")
--- a/tests/test_doc2x.py
+++ b/tests/test_doc2x.py
@@ -0,0 +1,7 @@
 import init_test
 from crazy_functions.pdf_fns.parse_pdf_via_doc2x import 解析PDF_DOC2X_转Latex
 # 解析PDF_DOC2X_转Latex("gpt_log/arxiv_cache_old/2410.10819/workfolder/merge.pdf")
 # 解析PDF_DOC2X_转Latex("gpt_log/arxiv_cache_ooo/2410.07095/workfolder/merge.pdf")
 解析PDF_DOC2X_转Latex("2410.11190v2.pdf")
--- a/tests/test_social_helper.py
+++ b/tests/test_social_helper.py
@@ -8,4 +8,17 @@ import os, sys
 if __name__ == "__main__":
    from test_utils import plugin_test
-    plugin_test(plugin='crazy_functions.Social_Helper->I人助手', main_input="|")
+    plugin_test(
        plugin='crazy_functions.Social_Helper->I人助手', 
        main_input="""
 添加联系人：
 艾德·史塔克：我的养父，他是临冬城的公爵。
 凯特琳·史塔克：我的养母，她对我态度冷淡，因为我是私生子。
 罗柏·史塔克：我的哥哥，他是北境的继承人。
 艾莉亚·史塔克：我的妹妹，她和我关系亲密，性格独立坚强。
 珊莎·史塔克：我的妹妹，她梦想成为一位淑女。
 布兰·史塔克：我的弟弟，他有预知未来的能力。
 瑞肯·史塔克：我的弟弟，他是个天真无邪的小孩。
 山姆威尔·塔利：我的朋友，他在守夜人军团中与我并肩作战。
 伊格瑞特：我的恋人，她是野人中的一员。
        """)
--- a/4
+++ b/4
@@ -1,5 +1,5 @@
 {
-  "version": 3.83,
+  "version": 3.90,
  "show_feature": true,
-  "new_feature": "增加欢迎页面 <-> 优化图像生成插件 <-> 添加紫东太初大模型支持 <-> 保留主题选择 <-> 支持更复杂的插件框架 <-> 上传文件时显示进度条"
+  "new_feature": "增加RAG组件 <-> 升级多合一主提交键"
 }
作者	SHA1	备注	提交日期
lbykkkk	61676d0536	up	2024-11-06 00:47:56 +08:00
lbykkkk	df2ef7940c	up	2024-11-05 02:08:12 +08:00
lbykkkk	c10f2b45e5	Default prompt word count control	2024-11-03 23:05:02 +08:00
lbykkkk	7e2ede2d12	up	2024-11-03 22:54:19 +08:00
lbykkkk	ec10e2a3ac	Merge branch 'refs/heads/batch-file-query' into boyin_summary # Conflicts: # crazy_functional.py	2024-11-03 22:49:29 +08:00
binary-husky	7474d43433	stage connection	2024-11-03 14:19:16 +00:00
binary-husky	83489f9acf	Merge remote-tracking branch 'origin/boyin_summary'	2024-11-03 14:12:04 +00:00
lbykkkk	36e50d490d	up	2024-11-03 17:57:56 +08:00
lbykkkk	9172337695	Add batch document inquiry function	2024-11-03 17:17:16 +08:00
binary-husky	180550b8f0	upgrade auto comment	2024-10-30 13:37:35 +00:00
binary-husky	7497dcb852	catch comment source code exception	2024-10-30 11:40:47 +00:00
lbykkkk	5dab7b2290	refine	2024-10-29 23:54:55 +08:00
binary-husky	23ef2ffb22	feat: change arxiv io param	2024-10-27 16:54:29 +00:00
binary-husky	848d0f65c7	share paper network beta	2024-10-27 16:08:25 +00:00
Menghuan1918	f0b0364f74	修复并改进build with latex的Docker构建 (#2020 ) * 改进构建文件 * 修复问题 * 更改docker注释，同时测试拉取大小	2024-10-27 23:17:03 +08:00
lbykkkk	89dc6c7265	refine	2024-10-21 22:58:04 +08:00
binary-husky	69f3755682	adjust max_token_limit for pdf translation plugin	2024-10-21 14:31:11 +00:00
binary-husky	4727113243	update doc2x functions	2024-10-21 14:05:42 +00:00
lbykkkk	21111d3bd0	refine	2024-10-21 00:57:29 +08:00
lbykkkk	701018f48c	up	2024-10-21 00:30:18 +08:00
lbykkkk	8733c4e1e9	file type support	2024-10-20 01:33:00 +08:00
lbykkkk	8498ddf6bf	up	2024-10-19 17:31:30 +00:00
lbykkkk	3c3293818d	Change the word document summary function to document summary function	2024-10-20 01:14:42 +08:00
wsg1873	310122f5a7	solve the concatenate error. (#2011 )	2024-10-16 00:56:24 +08:00
binary-husky	c83bf214d0	change arxiv download attempt url order	2024-10-15 09:09:24 +00:00
binary-husky	e34c49dce5	compat: deal with arxiv url change	2024-10-15 09:07:39 +00:00
binary-husky	3890467c84	replace `rm` with `rm -f`	2024-10-15 07:32:29 +00:00
binary-husky	074b3c9828	explicitly declare default value	2024-10-15 06:41:12 +00:00
Nextstrain	b8e8457a01	关于o1系列模型无法正常请求的修复，多模型轮询KeyError: 'finish_reason'的修复 (#1992 ) * Update bridge_all.py * Update bridge_chatgpt.py * Update bridge_chatgpt.py * Update bridge_all.py * Update bridge_all.py	2024-10-15 14:36:51 +08:00
binary-husky	2c93a24d7e	fix dockerfile: try align python	2024-10-15 06:35:35 +00:00
binary-husky	e9af6ef3a0	fix: github action glitch	2024-10-15 06:32:47 +00:00
wsg1873	5ae8981dbb	add the '/Fit' destination (#2009 )	2024-10-14 22:50:56 +08:00
binary-husky	adbed044e4	fix o1 compat problem	2024-10-13 17:02:07 +00:00
Menghuan1918	2fe5febaf0	为build-with-latex版本Docker构建新增arm64支持 (#1994 ) * Add arm64 support * Bug fix * Some build bug fix * Add arm support * 分离arm和x86构建 * 改进构建文档 * update tags * Update build-with-latex-arm.yml * Revert "Update build-with-latex-arm.yml" This reverts commit `9af92549b5`. * Update * Add * httpx * Addison * Update GithubAction+NoLocal+Latex * Update docker-compose.yml and GithubAction+NoLocal+Latex * Update README.md * test math anim generation * solve the pdf concatenate error. (#2006) * solve the pdf concatenate error. * add legacy fallback option --------- Co-authored-by: binary-husky <qingxu.fu@outlook.com> --------- Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com> Co-authored-by: binary-husky <qingxu.fu@outlook.com> Co-authored-by: wsg1873 <wsg0326@163.com>	2024-10-14 00:25:28 +08:00
wsg1873	f54d8e559a	solve the pdf concatenate error. (#2006 ) * solve the pdf concatenate error. * add legacy fallback option --------- Co-authored-by: binary-husky <qingxu.fu@outlook.com>	2024-10-13 16:16:51 +08:00
binary-husky	e68fc2bc69	Merge branch 'master' of github.com:binary-husky/chatgpt_academic	2024-10-11 13:33:05 +00:00
binary-husky	f695d7f1da	test math anim generation	2024-10-11 13:32:57 +00:00
binary-husky	679352d896	Update README.md	2024-10-10 13:38:35 +08:00
binary-husky	12c9ab1e33	Update README.md	2024-10-10 12:02:12 +08:00
binary-husky	da4a5efc49	lazy load llama-index lib	2024-10-06 16:26:26 +00:00
binary-husky	9ac450cfb6	紧急修复 fix httpx breaking bad error	2024-10-06 15:02:14 +00:00
binary-husky	172f9e220b	version 3.90	2024-10-05 16:51:08 +00:00
binary-husky	a28b7d8475	Merge branch 'master' of https://github.com/binary-husky/gpt_academic	2024-10-05 19:10:42 +08:00
binary-husky	7d3ed36899	fix: llama index deps verion limit	2024-10-05 19:10:38 +08:00
binary-husky	a7bc5fa357	remove out-dated jittor models	2024-10-05 10:58:45 +00:00
binary-husky	4f5dd9ebcf	add temp solution for llama-index compat	2024-10-05 09:53:21 +00:00
binary-husky	427feb99d8	llama-index==0.10.5	2024-10-05 17:34:08 +08:00
binary-husky	a01ca93362	Merge Latest Frontier (#1991 ) * logging sys to loguru: stage 1 complete * import loguru: stage 2 * logging -> loguru: stage 3 * support o1-preview and o1-mini * logging -> loguru stage 4 * update social helper * logging -> loguru: final stage * fix: console output * update translation matrix * fix: loguru argument error with proxy enabled (#1977) * relax llama index version * remove comment * Added some modules to support openrouter (#1975) * Added some modules for supporting openrouter model Added some modules for supporting openrouter model * Update config.py * Update .gitignore * Update bridge_openrouter.py * Not changed actually * Refactor logging in bridge_openrouter.py --------- Co-authored-by: binary-husky <qingxu.fu@outlook.com> * remove logging extra --------- Co-authored-by: Steven Moder <java20131114@gmail.com> Co-authored-by: Ren Lifei <2602264455@qq.com>	2024-10-05 17:09:18 +08:00
binary-husky	597c320808	fix: system prompt err when using o1 models	2024-09-14 17:04:01 +00:00
binary-husky	18290fd138	fix: support o1 models	2024-09-14 17:00:02 +00:00
binary-husky	0d0575a639	support o1-preview and o1-mini	2024-09-13 03:12:18 +00:00
		`@@ -1 +0,0 @@`
			`# 此Dockerfile不再维护，请前往docs/GithubAction+JittorLLMs`