比较提交

...

278 次代码提交

作者 SHA1 备注 提交日期
binary-husky
82e125d439 log user name during chat 2024-11-10 16:50:24 +00:00
binary-husky
197287fc30 Enhance archive extraction with error handling for tar and gzip formats 2024-11-09 10:10:46 +00:00
Bingchen Jiang
c37fcc9299 Adding support to new openai apikey format (#2030) 2024-11-09 13:41:19 +08:00
binary-husky
91f5e6b8f7 resolve pickle security issue 2024-11-04 13:49:49 +00:00
hcy2206
4f0851f703 增加了对于glm-4-plus的支持 (#2014)
* 增加对于讯飞星火大模型Spark4.0的支持

* Create github action sync.yml

* 增加对于智谱glm-4-plus的支持

* feat: change arxiv io param

* catch comment source code exception

* upgrade auto comment

* add security patch

---------

Co-authored-by: GH Action - Upstream Sync <action@github.com>
Co-authored-by: binary-husky <qingxu.fu@outlook.com>
2024-11-03 22:41:16 +08:00
binary-husky
2821f27756 add security patch 2024-11-03 14:34:17 +00:00
binary-husky
180550b8f0 upgrade auto comment 2024-10-30 13:37:35 +00:00
binary-husky
7497dcb852 catch comment source code exception 2024-10-30 11:40:47 +00:00
binary-husky
23ef2ffb22 feat: change arxiv io param 2024-10-27 16:54:29 +00:00
binary-husky
848d0f65c7 share paper network beta 2024-10-27 16:08:25 +00:00
Menghuan1918
f0b0364f74 修复并改进build with latex的Docker构建 (#2020)
* 改进构建文件

* 修复问题

* 更改docker注释,同时测试拉取大小
2024-10-27 23:17:03 +08:00
binary-husky
69f3755682 adjust max_token_limit for pdf translation plugin 2024-10-21 14:31:11 +00:00
binary-husky
4727113243 update doc2x functions 2024-10-21 14:05:42 +00:00
wsg1873
310122f5a7 solve the concatenate error. (#2011) 2024-10-16 00:56:24 +08:00
binary-husky
c83bf214d0 change arxiv download attempt url order 2024-10-15 09:09:24 +00:00
binary-husky
e34c49dce5 compat: deal with arxiv url change 2024-10-15 09:07:39 +00:00
binary-husky
3890467c84 replace rm with rm -f 2024-10-15 07:32:29 +00:00
binary-husky
074b3c9828 explicitly declare default value 2024-10-15 06:41:12 +00:00
Nextstrain
b8e8457a01 关于o1系列模型无法正常请求的修复,多模型轮询KeyError: 'finish_reason'的修复 (#1992)
* Update bridge_all.py

* Update bridge_chatgpt.py

* Update bridge_chatgpt.py

* Update bridge_all.py

* Update bridge_all.py
2024-10-15 14:36:51 +08:00
binary-husky
2c93a24d7e fix dockerfile: try align python 2024-10-15 06:35:35 +00:00
binary-husky
e9af6ef3a0 fix: github action glitch 2024-10-15 06:32:47 +00:00
wsg1873
5ae8981dbb add the '/Fit' destination (#2009) 2024-10-14 22:50:56 +08:00
binary-husky
adbed044e4 fix o1 compat problem 2024-10-13 17:02:07 +00:00
Menghuan1918
2fe5febaf0 为build-with-latex版本Docker构建新增arm64支持 (#1994)
* Add arm64 support

* Bug fix

* Some build bug fix

* Add arm support

* 分离arm和x86构建

* 改进构建文档

* update tags

* Update build-with-latex-arm.yml

* Revert "Update build-with-latex-arm.yml"

This reverts commit 9af92549b5.

* Update

* Add

* httpx

* Addison

* Update GithubAction+NoLocal+Latex

* Update docker-compose.yml and GithubAction+NoLocal+Latex

* Update README.md

* test math anim generation

* solve the pdf concatenate error. (#2006)

* solve the pdf concatenate error.

* add legacy fallback option

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>

---------

Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com>
Co-authored-by: binary-husky <qingxu.fu@outlook.com>
Co-authored-by: wsg1873 <wsg0326@163.com>
2024-10-14 00:25:28 +08:00
wsg1873
f54d8e559a solve the pdf concatenate error. (#2006)
* solve the pdf concatenate error.

* add legacy fallback option

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>
2024-10-13 16:16:51 +08:00
binary-husky
e68fc2bc69 Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2024-10-11 13:33:05 +00:00
binary-husky
f695d7f1da test math anim generation 2024-10-11 13:32:57 +00:00
binary-husky
679352d896 Update README.md 2024-10-10 13:38:35 +08:00
binary-husky
12c9ab1e33 Update README.md 2024-10-10 12:02:12 +08:00
binary-husky
da4a5efc49 lazy load llama-index lib 2024-10-06 16:26:26 +00:00
binary-husky
9ac450cfb6 紧急修复 fix httpx breaking bad error 2024-10-06 15:02:14 +00:00
binary-husky
172f9e220b version 3.90 2024-10-05 16:51:08 +00:00
binary-husky
a28b7d8475 Merge branch 'master' of https://github.com/binary-husky/gpt_academic 2024-10-05 19:10:42 +08:00
binary-husky
7d3ed36899 fix: llama index deps verion limit 2024-10-05 19:10:38 +08:00
binary-husky
a7bc5fa357 remove out-dated jittor models 2024-10-05 10:58:45 +00:00
binary-husky
4f5dd9ebcf add temp solution for llama-index compat 2024-10-05 09:53:21 +00:00
binary-husky
427feb99d8 llama-index==0.10.5 2024-10-05 17:34:08 +08:00
binary-husky
a01ca93362 Merge Latest Frontier (#1991)
* logging sys to loguru: stage 1 complete

* import loguru: stage 2

* logging -> loguru: stage 3

* support o1-preview and o1-mini

* logging -> loguru stage 4

* update social helper

* logging -> loguru: final stage

* fix: console output

* update translation matrix

* fix: loguru argument error with proxy enabled (#1977)

* relax llama index version

* remove comment

* Added some modules to support openrouter (#1975)

* Added some modules for supporting openrouter model

Added some modules for supporting openrouter model

* Update config.py

* Update .gitignore

* Update bridge_openrouter.py

* Not changed actually

* Refactor logging in bridge_openrouter.py

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>

* remove logging extra

---------

Co-authored-by: Steven Moder <java20131114@gmail.com>
Co-authored-by: Ren Lifei <2602264455@qq.com>
2024-10-05 17:09:18 +08:00
binary-husky
597c320808 fix: system prompt err when using o1 models 2024-09-14 17:04:01 +00:00
binary-husky
18290fd138 fix: support o1 models 2024-09-14 17:00:02 +00:00
binary-husky
0d0575a639 support o1-preview and o1-mini 2024-09-13 03:12:18 +00:00
binary-husky
4e041e1d4e Merge branch 'frontier': windows deps bug fix 2024-09-08 16:32:38 +00:00
binary-husky
7ef39770c7 fallback to simple vs in windows system 2024-09-09 00:27:02 +08:00
binary-husky
8222f638cf Merge branch 'frontier' 2024-09-08 15:46:13 +00:00
binary-husky
ab32c314ab change git ignore 2024-09-08 15:44:02 +00:00
binary-husky
dcfed97054 revise milvus rag 2024-09-08 15:43:01 +00:00
binary-husky
dd66ca26f7 Frontier (#1958)
* update welcome svg

* fix loading chatglm3 (#1937)

* update welcome svg

* update welcome message

* fix loading chatglm3

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>
Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com>

* begin rag project with llama index

* rag version one

* rag beta release

* add social worker (proto)

* fix llamaindex version

---------

Co-authored-by: moetayuko <loli@yuko.moe>
2024-09-08 23:20:42 +08:00
binary-husky
8b91d2ac0a add milvus vector store 2024-09-08 15:19:03 +00:00
binary-husky
e4e00b713f fix llamaindex version 2024-09-05 05:21:10 +00:00
binary-husky
710a65522c add social worker (proto) 2024-09-02 15:55:06 +00:00
binary-husky
34784c1d40 Merge branch 'rag' into frontier 2024-09-02 15:01:12 +00:00
binary-husky
80b1a6f99b rag beta release 2024-09-02 15:00:47 +00:00
binary-husky
08c3c56f53 rag version one 2024-08-28 15:14:13 +00:00
binary-husky
294716c832 begin rag project with llama index 2024-08-21 14:24:37 +00:00
binary-husky
16f4fd636e update ref 2024-08-19 16:14:52 +00:00
binary-husky
e07caf7a69 update openai api key pattern 2024-08-19 15:59:20 +00:00
moetayuko
a95b3daab9 fix loading chatglm3 (#1937)
* update welcome svg

* update welcome message

* fix loading chatglm3

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>
Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com>
2024-08-19 23:32:45 +08:00
binary-husky
4873e9dfdc update translation matrix 2024-08-12 13:50:37 +00:00
moetayuko
a119ab36fe fix enabling sparkv4 (#1936) 2024-08-12 21:45:08 +08:00
FatShibaInu
f9384e4e5f Add Support for Gemini 1.5 Pro & Gemini 1.5 Flash (#1926)
* Add Support for Gemini 1.5 Pro & 1.5 Flash.

* Update bridge_all.py

fix a spelling error in comments.

* Add Support for Gemini 1.5 Pro & Gemini 1.5 Flash
2024-08-12 21:44:24 +08:00
binary-husky
6fe5f6ee6e update welcome message 2024-08-05 11:37:06 +00:00
binary-husky
068d753426 update welcome svg 2024-08-04 15:59:09 +00:00
binary-husky
5010537f3c update welcome svg 2024-08-04 15:58:32 +00:00
binary-husky
f35f6633e0 fix: welcome card flip bug 2024-08-02 11:20:41 +00:00
hongyi-zhao
573dc4d184 Add claude-3-5-sonnet-20240620 (#1907)
See https://docs.anthropic.com/en/docs/about-claude/models#model-names fore model names.
2024-08-02 18:04:42 +08:00
binary-husky
da8b2d69ce update version 3.8 2024-08-02 10:02:04 +00:00
binary-husky
58e732c26f Merge branch 'frontier' 2024-08-02 09:50:40 +00:00
Menghuan1918
ca238daa8c 改进联网搜索插件-新增搜索模式,搜索增强 (#1874)
* Change default to Mixed option

* Add option optimizer

* Add search optimizer prompts

* Enhanced Processing

* Finish search_optimizer part

* prompts bug fix

* Bug fix
2024-07-23 00:55:48 +08:00
jiangfy-ihep
60b3491513 add gpt-4o-mini (#1904)
Co-authored-by: Fayu Jiang <jiangfayu@hotmail.com>
2024-07-23 00:55:34 +08:00
binary-husky
c1175bfb7d add flip card animation 2024-07-22 04:53:59 +00:00
binary-husky
b705afd5ff welcome menu bug fix 2024-07-22 04:35:52 +00:00
binary-husky
dfcd28abce add width_to_hide_welcome 2024-07-22 03:34:35 +00:00
binary-husky
1edaa9e234 hide when too narrow 2024-07-21 15:04:38 +00:00
binary-husky
f0cd617ec2 minor css improve 2024-07-20 10:29:47 +00:00
binary-husky
0b08bb2cea update svg 2024-07-20 07:15:08 +00:00
Keldos
d1f8607ac8 Update submit button dropdown style (#1900) 2024-07-20 14:50:56 +08:00
binary-husky
7eb68a2086 tune 2024-07-17 17:16:34 +00:00
binary-husky
ee9e99036a Merge branch 'frontier' of github.com:binary-husky/chatgpt_academic into frontier 2024-07-17 17:14:49 +00:00
binary-husky
55e255220b update 2024-07-17 17:12:32 +00:00
lbykkkk
019cd26ae8 Merge branch 'frontier' of https://github.com/binary-husky/gpt_academic into frontier 2024-07-18 00:35:51 +08:00
lbykkkk
a5b21d5cc0 修改content并统一logo颜色 2024-07-18 00:35:40 +08:00
binary-husky
ce940ff70f roll welcome msg 2024-07-17 16:34:24 +00:00
binary-husky
fc6a83c29f update 2024-07-17 15:44:08 +00:00
binary-husky
1d3212e367 reverse welcome msg 2024-07-17 15:43:41 +00:00
lbykkkk
8a835352a3 更新欢迎界面的用语和logo 2024-07-17 19:49:07 +08:00
binary-husky
5456c9fa43 improve welcome UI 2024-07-16 16:23:07 +00:00
binary-husky
ea67054c30 update chuanhu theme 2024-07-16 16:07:46 +00:00
binary-husky
1084108df6 adding welcome page 2024-07-16 10:41:25 +00:00
binary-husky
40c9700a8d add welcome page 2024-07-15 15:47:24 +00:00
binary-husky
6da5623813 多用途复用提交按钮 2024-07-15 04:23:43 +00:00
binary-husky
778c9cd9ec roll version 2024-07-15 03:29:56 +00:00
binary-husky
e290317146 proxy submit btn 2024-07-15 03:28:59 +00:00
binary-husky
85b92b7f07 move python comment agent to dropdown 2024-07-13 16:26:36 +00:00
binary-husky
ff899777ce improve source code comment plugin functionality 2024-07-13 16:20:17 +00:00
binary-husky
c1b8c773c3 stage compare source code comment 2024-07-13 15:28:53 +00:00
binary-husky
8747c48175 mt improvement 2024-07-12 08:26:40 +00:00
binary-husky
c0010c88bc implement auto comment 2024-07-12 07:36:40 +00:00
binary-husky
68838da8ad finish test 2024-07-12 04:19:07 +00:00
binary-husky
ca7de8fcdd version up 2024-07-10 02:00:36 +00:00
binary-husky
7ebc2d00e7 Merge branch 'master' into frontier 2024-07-09 03:19:35 +00:00
binary-husky
47fb81cfde Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2024-07-09 03:18:19 +00:00
binary-husky
83961c1002 optimize image generation fn 2024-07-09 03:18:14 +00:00
binary-husky
a8621333af js impl bug fix 2024-07-08 15:50:12 +00:00
binary-husky
f402ef8134 hide ask btn 2024-07-08 15:15:30 +00:00
binary-husky
65d0f486f1 change cache to lru_cache for lower python version 2024-07-07 16:02:05 +00:00
binary-husky
41f25a6a9b Merge branch 'bold_frontier' into frontier 2024-07-04 14:16:08 +00:00
binary-husky
4a6a032334 ignore 2024-07-04 14:14:49 +00:00
binary-husky
f945a7bd19 preserve theme selection 2024-07-04 14:11:51 +00:00
binary-husky
379dcb2fa7 minor gui bug fix 2024-07-04 13:31:21 +00:00
Menghuan1918
114192e025 Bug fix: can not chat with deepseek (#1879) 2024-07-04 20:28:53 +08:00
binary-husky
30c905917a unify plugin calling 2024-07-02 15:32:40 +00:00
binary-husky
0c6c357e9c revise qwen 2024-07-02 14:22:45 +00:00
binary-husky
9d11b17f25 Merge branch 'master' into frontier 2024-07-02 08:06:34 +00:00
binary-husky
1d9e9fa6a1 new page btn 2024-07-01 16:27:23 +00:00
Menghuan1918
6cd2d80dfd Bug fix: Some non-standard forms of error return are not caught (#1877) 2024-07-01 20:35:49 +08:00
binary-husky
18d3245fc9 ready next gradio version 2024-06-29 15:29:48 +00:00
hcy2206
194e665a3b 增加了对于讯飞星火大模型Spark4.0的支持 (#1875) 2024-06-29 23:20:04 +08:00
binary-husky
7e201c5028 move test file to correct position 2024-06-28 08:23:40 +00:00
binary-husky
6babcb4a9c Merge branch 'master' into frontier 2024-06-27 06:52:03 +00:00
binary-husky
00e5a31b50 Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2024-06-27 06:50:06 +00:00
binary-husky
d8b9686eeb fix latex auto correct 2024-06-27 06:49:36 +00:00
binary-husky
b7b4e201cb fix latex auto correct 2024-06-27 06:49:10 +00:00
binary-husky
26e7677dc3 fix new api for taichu 2024-06-26 15:18:11 +00:00
Menghuan1918
25e06de1b6 Docker build bug fix (#1870) 2024-06-26 14:31:31 +08:00
binary-husky
5e64a50898 Merge branch 'master' into frontier 2024-06-25 11:43:40 +00:00
binary-husky
0ad571e6b5 prevent further stream when reset is clicked 2024-06-25 11:43:14 +00:00
binary-husky
60a42fb070 Merge branch 'master' into frontier 2024-06-25 11:14:32 +00:00
binary-husky
ddad5247fc upgrade searxng 2024-06-25 11:12:51 +00:00
binary-husky
c94d5054a2 move fn 2024-06-25 08:53:28 +00:00
binary-husky
ececfb9b6e test new dropdown js code 2024-06-25 08:34:50 +00:00
binary-husky
9f13c5cedf update default value of scroller_max_len 2024-06-25 05:34:55 +00:00
binary-husky
68b36042ce re-locate plugin 2024-06-25 05:32:20 +00:00
binary-husky
cac6c50d2f roll version 2024-06-19 12:56:23 +00:00
binary-husky
f884eb43cf Merge branch 'master' into frontier 2024-06-19 12:56:04 +00:00
binary-husky
d37383dd4e change arxiv cache dir path 2024-06-19 12:49:34 +00:00
binary-husky
dfae4e8081 optimize scolling visual effect 2024-06-19 12:42:11 +00:00
binary-husky
15cc08505f resolve safe pickle err 2024-06-19 11:59:47 +00:00
iluem
c5a82f6ab7 Merge pull request from GHSA-3jrq-66fm-w7xr 2024-06-19 14:29:21 +08:00
binary-husky
768ed4514a minor formatting issue 2024-06-18 14:51:53 +00:00
binary-husky
9dfbff7fd0 Merge branch 'GHSA-3jrq-66fm-w7xr' into frontier 2024-06-18 10:19:10 +00:00
binary-husky
47cedde954 fix security issue GHSA-3jrq-66fm-w7xr 2024-06-18 10:18:33 +00:00
binary-husky
1e16485087 internet gpt minor bug fix 2024-06-16 15:16:24 +00:00
binary-husky
f3660d669f internet GPT upgrade 2024-06-16 14:10:38 +00:00
binary-husky
e6d1cb09cb Merge branch 'master' into frontier 2024-06-16 13:47:15 +00:00
binary-husky
12aebf9707 searxng based information gathering 2024-06-16 12:12:57 +00:00
binary-husky
0b5385e5e5 Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2024-06-12 09:34:12 +00:00
binary-husky
2ff1a1fb0b update translation matrix 2024-06-12 09:34:05 +00:00
Yuki
cdadd38cf7 ️feat: block access to openapi references while running under fastapi (#1849)
- block fastapi openapi reference(swagger and redoc) routes
2024-06-10 22:26:46 +08:00
binary-husky
48e10fb10a Update README.md 2024-06-10 22:22:04 +08:00
binary-husky
ba484c55a0 Merge branch 'master' into frontier 2024-06-10 14:19:26 +00:00
Frank Lee
ca64a592f5 Update zhipu models (#1852) 2024-06-10 22:17:51 +08:00
Guoxin Sun
cb96ca132a Update common.js (#1854)
fix typo
2024-06-10 22:17:27 +08:00
binary-husky
737101b81d remove debug msg 2024-06-07 17:00:05 +00:00
binary-husky
612caa2f5f revise 2024-06-07 16:50:27 +00:00
binary-husky
85dbe4a4bf pdf processing improvement 2024-06-07 15:53:08 +00:00
binary-husky
2262a4d80a taichu model fix 2024-06-06 09:35:05 +00:00
binary-husky
b456ff02ab add note 2024-06-06 09:14:32 +00:00
binary-husky
24a21ae320 紫东太初大模型 2024-06-06 09:05:06 +00:00
binary-husky
3d5790cc2c resolve fallback to non-multimodal problem 2024-06-06 08:00:30 +00:00
binary-husky
7de6015800 multimodal support for gpt-4o etc 2024-06-06 07:36:37 +00:00
binary-husky
46428b7c7a Merge branch 'master' into frontier 2024-06-01 16:22:32 +00:00
binary-husky
66a50c8019 live2d shutdown bug fix 2024-06-01 16:21:04 +00:00
Menghuan1918
814dc943ac 将“生成多种图表”插件高级参数更新为二级菜单 (#1839)
* Improve the prompts

* Update to new meun form

* Bug fix (wrong type of plugin_kwargs)
2024-06-01 13:34:33 +08:00
binary-husky
96cd1f0b25 secondary menu main input sync bug fix 2024-05-31 04:13:27 +00:00
binary-husky
4fc17f4add Merge branch 'master' into frontier 2024-05-30 15:00:44 +00:00
binary-husky
b3665d8fec remove check 2024-05-30 14:54:50 +00:00
binary-husky
80c4281888 TTS Default Enable 2024-05-30 14:27:18 +00:00
binary-husky
beda56abb0 update dockerfile 2024-05-30 12:44:17 +00:00
binary-husky
cb16941d01 update css 2024-05-30 12:35:47 +00:00
binary-husky
5cf9ac7849 Merge branch 'master' into frontier 2024-05-29 16:06:28 +00:00
binary-husky
51ddb88ceb correct hint err 2024-05-29 16:05:23 +00:00
binary-husky
69dfe5d514 compat to old void-terminal plugin 2024-05-29 15:50:00 +00:00
binary-husky
6819f87512 Merge branch 'frontier' of github.com:binary-husky/chatgpt_academic into frontier 2024-05-23 16:35:20 +00:00
binary-husky
3d51b9d5bb compat baichuan 2024-05-23 16:35:15 +00:00
QiyuanChen
bff87ada92 添加对ERNIE-Speed和ERNIE-Lite模型的支持 (#1821)
* feat: add ERNIE-Speed and ERNIE-Lite

百度的ERNIE-Speed and ERNIE-Lite模型开始免费使用了,故添加了调用地址。可以使用ERNIE-Speed-128K,ERNIE-Speed-8K,ERNIE-Lite-8K来访问

* chore: Modify supported models in config.py

修改了config.py中千帆支持的模型列表,添加了三款免费模型
2024-05-24 00:16:26 +08:00
binary-husky
a938412b6f save conversation wrap 2024-05-23 15:58:59 +00:00
binary-husky
a48acf6fec Flex Btn Bug Fix 2024-05-22 08:38:40 +00:00
binary-husky
c6b9ab5214 add document 2024-05-22 06:39:56 +00:00
binary-husky
aa3332de69 add document 2024-05-22 06:27:26 +00:00
binary-husky
d43175d46d fix type hint 2024-05-21 13:18:38 +00:00
binary-husky
8ca9232db2 Merge branch 'master' into frontier 2024-05-21 12:27:01 +00:00
binary-husky
1339aa0e1a doc2x latex convertion 2024-05-21 12:24:50 +00:00
binary-husky
f41419e767 update demo 2024-05-21 11:12:08 +00:00
binary-husky
d88c585305 improve latex plugin 2024-05-21 10:47:50 +00:00
binary-husky
0a88d18c7a secondary menu for pdf trans 2024-05-21 08:51:29 +00:00
binary-husky
0d0edc2216 Merge branch 'frontier' of github.com:binary-husky/chatgpt_academic into frontier 2024-05-19 21:54:16 +08:00
binary-husky
5e0875fcf4 from backend to front end 2024-05-19 21:54:06 +08:00
Shixian Sheng
c508b84db8 更新了README.md/Update README.md (#1810) 2024-05-19 20:41:17 +08:00
Menghuan1918
f2b67602bb 为docker构建添加FFmpeg依赖 (#1807)
* Test: change dockerfile to install ffmpeg

* Add the ffmpeg to dockerfile (required by edge-tts)
2024-05-19 14:27:55 +08:00
binary-husky
29daba5d2f success? 2024-05-18 23:03:28 +08:00
binary-husky
9477824ac1 improve css 2024-05-18 21:54:15 +08:00
binary-husky
459c5b2d24 plugin refactor: phase 1 2024-05-18 20:23:50 +08:00
binary-husky
abf9b5aee5 Merge branch 'master' into frontier 2024-05-18 15:52:08 +08:00
binary-husky
2ce4482146 fix new ModelOverride fn bug 2024-05-18 15:47:25 +08:00
binary-husky
4282b83035 change TTS default to DISABLE 2024-05-18 15:43:35 +08:00
binary-husky
537be57c9b fix tts bugs 2024-05-17 21:07:28 +08:00
binary-husky
3aa92d6c80 change main ui hint 2024-05-17 11:34:13 +08:00
awwaawwa
b7eb9aba49 [Feature]: allow model mutex override in core_functional.py (#1708)
* allow_core_func_specify_model

* change arg name

* 模型覆盖支持热更新&当模型覆盖指向不存在的模型时报错

* allow model mutex override

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>
2024-05-17 11:15:23 +08:00
hongyi-zhao
881a596a30 model support (gpt4o) in project. (#1760)
* Add the environment variable: OPEN_BROWSER

* Add configurable browser launching with custom arguments

- Update `config.py` to include options for specifying the browser and its arguments for opening URLs.
- Modify `main.py` to use the configured browser settings from `config.py` to launch the web page.
- Enhance `config_loader.py` to process path-like strings by expanding and normalizing paths, which supports the configuration improvements.

* Add support for the following models:

"gpt-4o", "gpt-4o-2024-05-13"

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>
2024-05-14 17:01:32 +08:00
binary-husky
1b3c331d01 dos2unix 2024-05-14 12:02:40 +08:00
binary-husky
70d5f2a7df arg name err patch 2024-05-13 23:40:35 +08:00
Menghuan1918
fd2f8b9090 Provide a new fast and simple way of accessing APIs (As example: Yi-models,Deepseek) (#1782)
* deal with the message part

* Finish no_ui_connect

* finish predict part

* Delete old version

* An example of add new api

* Bug fix:can not change in "model_info"

* Bug fix

* Error message handling

* Clear the format

* An example of add a openai form API:Deepseek

* For compatibility reasons

* Feture: set different API/Endpoint to diferent models

* Add support for YI new models

* 更新doc2x的api key机制 (#1766)

* Fix DOC2X API key refresh issue in PDF translation

* remove add

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>

* 修改部分文件名、变量名

* patch err

---------

Co-authored-by: alex_xiao <113411296+Alex4210987@users.noreply.github.com>
Co-authored-by: binary-husky <qingxu.fu@outlook.com>
2024-05-13 23:38:08 +08:00
binary-husky
225a2de011 Version 3.76 (#1752)
* version roll

* add upload processbar
2024-05-13 22:54:38 +08:00
binary-husky
6aea6d8e2b Merge branch 'master' into frontier 2024-05-13 22:52:15 +08:00
alex_xiao
8d85616c27 更新doc2x的api key机制 (#1766)
* Fix DOC2X API key refresh issue in PDF translation

* remove add

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>
2024-05-13 22:49:40 +08:00
binary-husky
e4533dd24d Merge branch 'master' into frontier 2024-05-04 17:00:09 +08:00
binary-husky
43ed8cb8a8 Fix fastapi version compat 2024-05-04 16:43:42 +08:00
binary-husky
3eff964424 Update README.md 2024-05-01 17:59:25 +08:00
OREEkE
ebde98b34b Update requirements.txt (#1753)
TTS_TYPE = "EDGE_TTS"需要的依赖
2024-05-01 14:55:04 +08:00
binary-husky
6f883031c0 Update config.py 2024-05-01 14:54:36 +08:00
binary-husky
fa15059f07 add upload processbar 2024-05-01 01:11:35 +08:00
binary-husky
685c573619 version roll 2024-04-30 21:00:25 +08:00
binary-husky
5fcd02506c version 3.75 (#1702)
* Update version to 3.74

* Add support for Yi Model API (#1635)

* 更新以支持零一万物模型

* 删除newbing

* 修改config

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>

* Refactor function signatures in bridge files

* fix qwen api change

* rename and ref functions

* rename and move some cookie functions

* 增加haiku模型,新增endpoint配置说明 (#1626)

* haiku added

* 新增haiku,新增endpoint配置说明

* Haiku added

* 将说明同步至最新Endpoint

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>

* private_upload目录下进行文件鉴权 (#1596)

* private_upload目录下进行文件鉴权

* minor fastapi adjustment

* Add logging functionality to enable saving
conversation records

* waiting to fix username retrieve

* support 2rd web path

* allow accessing default user dir

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>

* remove yaml deps

* fix favicon

* fix abs path auth problem

* forget to write a return

* add `dashscope` to deps

* fix GHSA-v9q9-xj86-953p

* 用户名重叠越权访问patch (#1681)

* add cohere model api access

* cohere + can_multi_thread

* fix block user access(fail)

* fix fastapi bug

* change cohere api endpoint

* explain version

* # fix com_zhipuglm.py illegal temperature problem (#1687)

* Update com_zhipuglm.py

# fix 用户在使用 zhipuai 界面时遇到了关于温度参数的非法参数错误

* allow store lm model dropdown

* add a btn to reverse previous reset

* remove extra fns

* Add support for glm-4v model (#1700)

* 修改chatglm3量化加载方式 (#1688)

Co-authored-by: zym9804 <ren990603@gmail.com>

* save chat stage 1

* consider null cookie situation

* 在点击复制按钮时激活语音

* miss some parts

* move all to js

* done first stage

* add edge tts

* bug fix

* bug fix

* remove console log

* bug fix

* bug fix

* bug fix

* audio switch

* update tts readme

* remove tempfile when done

* disable auto audio follow

* avoid play queue update after shut up

* feat: minimizing common.js

* improve tts functionality

* deterine whether the cached model is in choices

* Add support for Ollama (#1740)

* print err when doc2x not successful

* add icon

* adjust url for doc2x key version

* prepare merge

---------

Co-authored-by: Menghuan1918 <menghuan2003@outlook.com>
Co-authored-by: Skyzayre <120616113+Skyzayre@users.noreply.github.com>
Co-authored-by: XIao <46100050+Kilig947@users.noreply.github.com>
Co-authored-by: Yuki <903728862@qq.com>
Co-authored-by: zyren123 <91042213+zyren123@users.noreply.github.com>
Co-authored-by: zym9804 <ren990603@gmail.com>
2024-04-30 20:37:41 +08:00
binary-husky
bd5280df1b minor pdf translation adjustment 2024-04-30 00:52:36 +08:00
binary-husky
744759704d allow personal docx api access 2024-04-29 23:53:41 +08:00
WFS
81df0aa210 fix the issue of when using google Gemini pro, don't have chat histor… (#1743)
* fix the issue of when using google Gemini pro, don't have chat history record

just add chat_log in bridge_google_gmini.py

* Update bridge_google_gemini.py

---------

Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com>
2024-04-25 22:26:32 +08:00
Menghuan1918
cadaa81030 Fix the bug cause Nougat can not use (#1738)
* Bug fix for nougat require pdf

* Fixing bugs in a simpler and safer way
2024-04-24 12:13:44 +08:00
binary-husky
3b6cbbdcb0 Update README.md (#1736) 2024-04-24 11:41:56 +08:00
binary-husky
52e49c48b8 the latest zhipuai whl is broken 2024-04-23 18:20:36 +08:00
binary-husky
6ad15a6129 fix equation showing problem 2024-04-22 01:54:03 +08:00
binary-husky
09990d44d3 merge to resolve multiple pickle security issues (#1728)
* 注释调试if分支

* support pdf url for latex translation

* Merge pull request from GHSA-mvrw-h7rc-22r8

* 注释调试if分支

* Improve objload security

* Update README.md

* support pdf url for latex translation

---------

Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com>
Co-authored-by: binary-husky <qingxu.fu@outlook.com>

* fix import

---------

Co-authored-by: Longtaotao <longtaotao@bupt.edu.cn>
Co-authored-by: iluem <57590186+Qhaoduoyu@users.noreply.github.com>
2024-04-21 19:37:05 +08:00
binary-husky
eac5191815 Update README.md 2024-04-21 02:12:15 +08:00
owo
ae4407135d fix: 添加report_exception中缺失的a参数 (#1720)
在report_exception函数的定义中,参数a未包含默认值,因此应提供相应的值传入。
2024-04-18 16:27:00 +08:00
owo
f0e15bd710 fix: 修复了在else语句中调用'schema_str'之前未定义的问题 (#1719)
重新排列了方法中的条件返回语句,以确保在使用之前始终定义了'schema_str'。
2024-04-18 16:26:13 +08:00
jiangfy-ihep
5c5f442649 Fix: openai project API key pattern (#1721)
Co-authored-by: Fayu Jiang <jiangfayu@hotmail.com>
2024-04-18 16:24:29 +08:00
binary-husky
160552cc5f introduce doc2x 2024-04-15 01:57:31 +08:00
binary-husky
c131ec0b20 rename pdf plugin file name 2024-04-14 22:46:31 +08:00
iluem
2f3aeb7976 Merge pull request from GHSA-23cr-v6pm-j89p
* Update crazy_utils.py

Improve security

* add a white space

---------

Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com>
2024-04-14 21:51:03 +08:00
binary-husky
eff5b89b98 scan first, then extract 2024-04-14 21:36:57 +08:00
iluem
f77ab27bc9 Merge pull request from GHSA-rh7j-jfvq-857j
Prevent path traversal for improved security
2024-04-14 21:33:37 +08:00
awwaawwa
ba0a8b7072 integrate gpt-4-turbo-2024-04-09 (#1698)
* 接入 gpt-4-turbo-2024-04-09 模型

* add gpt-4-turbo and change to vision

* add gpt-4-turbo to avail llm models

* 暂时将gpt-4-turbo接入至普通版本
2024-04-11 22:02:40 +08:00
hmp
2406022c2a access vllm 2024-04-11 22:00:07 +08:00
OREEkE
02b6f26b05 remove logging in gradios.py (#1699)
如果初始主题是HF社区主题,这里使用logging会导致程序不再写入日志(包括对话内容在内的任何记录),下载主题的日志输出和程序启动时的日志初始化有冲突。
2024-04-11 14:15:12 +08:00
OREEkE
2a003e8d49 add loadLive2D() when ADD_WAIFU = False (#1693)
ADD_WAIFU = False,浏览器会抛出错误:[Error] JQuery is not defined. 因为这时候没有jQuery库可用,却依然使用了loadLive2D()函数。现在加一个判断,如果ADD_WAIFU = False,禁用jQuery库的同时也禁用loadLive2D()函数,除非ADD_WAIFU = True
2024-04-10 00:10:53 +08:00
binary-husky
21891b0f6d update translate matrix 2024-04-08 12:43:24 +08:00
Yuki
163f12c533 # fix com_zhipuglm.py illegal temperature problem (#1687)
* Update com_zhipuglm.py

# fix 用户在使用 zhipuai 界面时遇到了关于温度参数的非法参数错误
2024-04-08 12:17:07 +08:00
binary-husky
bdd46c5dd1 Version 3.74: Merge latest updates on dev branch (frontier) (#1621)
* Update version to 3.74

* Add support for Yi Model API (#1635)

* 更新以支持零一万物模型

* 删除newbing

* 修改config

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>

* Refactor function signatures in bridge files

* fix qwen api change

* rename and ref functions

* rename and move some cookie functions

* 增加haiku模型,新增endpoint配置说明 (#1626)

* haiku added

* 新增haiku,新增endpoint配置说明

* Haiku added

* 将说明同步至最新Endpoint

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>

* private_upload目录下进行文件鉴权 (#1596)

* private_upload目录下进行文件鉴权

* minor fastapi adjustment

* Add logging functionality to enable saving
conversation records

* waiting to fix username retrieve

* support 2rd web path

* allow accessing default user dir

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>

* remove yaml deps

* fix favicon

* fix abs path auth problem

* forget to write a return

* add `dashscope` to deps

* fix GHSA-v9q9-xj86-953p

* 用户名重叠越权访问patch (#1681)

* add cohere model api access

* cohere + can_multi_thread

* fix block user access(fail)

* fix fastapi bug

* change cohere api endpoint

* explain version

---------

Co-authored-by: Menghuan1918 <menghuan2003@outlook.com>
Co-authored-by: Skyzayre <120616113+Skyzayre@users.noreply.github.com>
Co-authored-by: XIao <46100050+Kilig947@users.noreply.github.com>
2024-04-08 11:49:30 +08:00
binary-husky
ae51a0e686 fix GHSA-v9q9-xj86-953p 2024-04-05 20:47:11 +08:00
binary-husky
f2582ea137 fix qwen api change 2024-04-03 12:17:41 +08:00
binary-husky
ddd2fd84da fix checkbox bugs 2024-04-02 19:42:55 +08:00
binary-husky
6c90ff80ea add prompt and temperature to cookie 2024-04-02 18:02:00 +08:00
binary-husky
cb7c0703be Update requirements.txt (#1668) 2024-04-01 11:30:50 +08:00
binary-husky
5181cd441d change pip install url due to server failure (#1667) 2024-04-01 11:20:14 +08:00
binary-husky
216d4374e7 fix color list overflow 2024-04-01 00:11:32 +08:00
iluem
8af6c0cab6 Qhaoduoyu patch 1: pickle to json to increase security (#1648)
* Update theme.py

fix bugs

* Update theme.py

fix bugs

* change var names

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>
2024-03-25 09:54:30 +08:00
binary-husky
67ad041372 fix issue #1640 2024-03-20 18:09:37 +08:00
binary-husky
725c72229c update docker compose 2024-03-20 17:37:03 +08:00
Menghuan1918
e42ede512b Update Claude3 api request and fix some bugs (#1641)
* Update version to 3.74

* Add support for Yi Model API (#1635)

* 更新以支持零一万物模型

* 删除newbing

* 修改config

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>

* Update claude requrest to http type

* Update for endpoint

* Add support for other tpyes of pictures

* Update pip packages

* Fix console_slience issue while error handling

* revert version changes

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>
2024-03-20 17:22:23 +08:00
binary-husky
84ccc9e64c fix claude + oneapi error 2024-03-17 14:53:28 +08:00
binary-husky
c172847e19 add python annotations for toolbox functions 2024-03-16 22:54:33 +08:00
binary-husky
d166d25eb4 resolve invalid escape sequence warning
to support python3.12
2024-03-11 18:10:05 +08:00
binary-husky
516bbb1331 Update README.md 2024-03-11 17:40:16 +08:00
binary-husky
c3140ce344 merge frontier branch (#1620)
* Zhipu sdk update 适配最新的智谱SDK,支持GLM4v (#1502)

* 适配 google gemini 优化为从用户input中提取文件

* 适配最新的智谱SDK、支持glm-4v

* requirements.txt fix

* pending history check

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>

* Update "生成多种Mermaid图表" plugin: Separate out the file reading function (#1520)

* Update crazy_functional.py with new functionality deal with PDF

* Update crazy_functional.py and Mermaid.py for plugin_kwargs

* Update crazy_functional.py with new chart type: mind map

* Update SELECT_PROMPT and i_say_show_user messages

* Update ArgsReminder message in get_crazy_functions() function

* Update with read md file and update PROMPTS

* Return the PROMPTS as the test found that the initial version worked best

* Update Mermaid chart generation function

* version 3.71

* 解决issues #1510

* Remove unnecessary text from sys_prompt in 解析历史输入 function

* Remove sys_prompt message in 解析历史输入 function

* Update bridge_all.py: supports gpt-4-turbo-preview (#1517)

* Update bridge_all.py: supports gpt-4-turbo-preview

supports gpt-4-turbo-preview

* Update bridge_all.py

---------

Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com>

* Update config.py: supports gpt-4-turbo-preview (#1516)

* Update config.py: supports gpt-4-turbo-preview

supports gpt-4-turbo-preview

* Update config.py

---------

Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com>

* Refactor 解析历史输入 function to handle file input

* Update Mermaid chart generation functionality

* rename files and functions

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>
Co-authored-by: hongyi-zhao <hongyi.zhao@gmail.com>
Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com>

* 接入mathpix ocr功能 (#1468)

* Update Latex输出PDF结果.py

借助mathpix实现了PDF翻译中文并重新编译PDF

* Update config.py

add mathpix appid & appkey

* Add 'PDF翻译中文并重新编译PDF' feature to plugins.

---------

Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com>

* fix zhipuai

* check picture

* remove glm-4 due to bug

* 修改config

* 检查MATHPIX_APPID

* Remove unnecessary code and update
function_plugins dictionary

* capture non-standard token overflow

* bug fix #1524

* change mermaid style

* 支持mermaid 滚动放大缩小重置,鼠标滚动和拖拽 (#1530)

* 支持mermaid 滚动放大缩小重置,鼠标滚动和拖拽

* 微调未果 先stage一下

* update

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>
Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com>

* ver 3.72

* change live2d

* save the status of ``clear btn` in cookie

* 前端选择保持

* js ui bug fix

* reset btn bug fix

* update live2d tips

* fix missing get_token_num method

* fix live2d toggle switch

* fix persistent custom btn with cookie

* fix zhipuai feedback with core functionality

* Refactor button update and clean up functions

* tailing space removal

* Fix missing MATHPIX_APPID and MATHPIX_APPKEY
configuration

* Prompt fix、脑图提示词优化 (#1537)

* 适配 google gemini 优化为从用户input中提取文件

* 脑图提示词优化

* Fix missing MATHPIX_APPID and MATHPIX_APPKEY
configuration

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>

* 优化“PDF翻译中文并重新编译PDF”插件 (#1602)

* Add gemini_endpoint to API_URL_REDIRECT (#1560)

* Add gemini_endpoint to API_URL_REDIRECT

* Update gemini-pro and gemini-pro-vision model_info
endpoints

* Update to support new claude models (#1606)

* Add anthropic library and update claude models

* 更新bridge_claude.py文件,添加了对图片输入的支持。修复了一些bug。

* 添加Claude_3_Models变量以限制图片数量

* Refactor code to improve readability and
maintainability

* minor claude bug fix

* more flexible one-api support

* reformat config

* fix one-api new access bug

* dummy

* compat non-standard api

* version 3.73

---------

Co-authored-by: XIao <46100050+Kilig947@users.noreply.github.com>
Co-authored-by: Menghuan1918 <menghuan2003@outlook.com>
Co-authored-by: hongyi-zhao <hongyi.zhao@gmail.com>
Co-authored-by: Hao Ma <893017927@qq.com>
Co-authored-by: zeyuan huang <599012428@qq.com>
2024-03-11 17:26:09 +08:00
binary-husky
cd18663800 compat non-standard api - 2 2024-03-10 17:13:54 +08:00
binary-husky
dbf1322836 compat non-standard api 2024-03-10 17:07:59 +08:00
XIao
98dd3ae1c0 Moonshot- 在config.py中增加可用模型 (#1603)
* 支持月之暗面api

* fix文案

* 优化noui的返回值,对话历史文件继续上传到moonshat

* fix

* config 可用模型配置增加

* add `can_multi_thread` model attr (#1598)

---------

Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com>
Co-authored-by: binary-husky <qingxu.fu@outlook.com>
2024-03-05 16:07:05 +08:00
binary-husky
3036709496 add can_multi_thread model attr (#1598) 2024-03-05 15:58:18 +08:00
XIao
8e9c07644f 支持月之暗面api,文件对话 (#1597)
* 支持月之暗面api

* fix文案
2024-03-03 23:42:17 +08:00
binary-husky
90d96b77e6 handle qianfan chat error 2024-02-29 00:36:06 +08:00
binary-husky
66c876a9ca Update README.md 2024-02-26 22:56:09 +08:00
binary-husky
0665eb75ed Update README.md (#1581) 2024-02-26 22:52:00 +08:00
binary-husky
6b784035fa Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2024-02-25 21:13:56 +08:00
binary-husky
8bb3d84912 fix zip chinese file name error 2024-02-25 21:13:41 +08:00
binary-husky
a0193cf227 edit dep url 2024-02-23 13:28:49 +08:00
binary-husky
b72289bfb0 Fix missing MATHPIX_APPID and MATHPIX_APPKEY
configuration
2024-02-21 14:20:10 +08:00
Menghuan1918
bdfe3862eb 添加部分翻译 (#1566) 2024-02-21 14:14:06 +08:00
binary-husky
dae180b9ea update spark v3.5, fix glm parallel problem 2024-02-18 14:08:35 +08:00
binary-husky
e359fff040 Fix response message bug in bridge_qianfan.py,
bridge_qwen.py, and bridge_skylark2.py
2024-02-15 00:02:24 +08:00
binary-husky
2e9b4a5770 Merge Frontier, Update to Version 3.72 (#1553)
* Zhipu sdk update 适配最新的智谱SDK,支持GLM4v (#1502)

* 适配 google gemini 优化为从用户input中提取文件

* 适配最新的智谱SDK、支持glm-4v

* requirements.txt fix

* pending history check

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>

* Update "生成多种Mermaid图表" plugin: Separate out the file reading function (#1520)

* Update crazy_functional.py with new functionality deal with PDF

* Update crazy_functional.py and Mermaid.py for plugin_kwargs

* Update crazy_functional.py with new chart type: mind map

* Update SELECT_PROMPT and i_say_show_user messages

* Update ArgsReminder message in get_crazy_functions() function

* Update with read md file and update PROMPTS

* Return the PROMPTS as the test found that the initial version worked best

* Update Mermaid chart generation function

* version 3.71

* 解决issues #1510

* Remove unnecessary text from sys_prompt in 解析历史输入 function

* Remove sys_prompt message in 解析历史输入 function

* Update bridge_all.py: supports gpt-4-turbo-preview (#1517)

* Update bridge_all.py: supports gpt-4-turbo-preview

supports gpt-4-turbo-preview

* Update bridge_all.py

---------

Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com>

* Update config.py: supports gpt-4-turbo-preview (#1516)

* Update config.py: supports gpt-4-turbo-preview

supports gpt-4-turbo-preview

* Update config.py

---------

Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com>

* Refactor 解析历史输入 function to handle file input

* Update Mermaid chart generation functionality

* rename files and functions

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>
Co-authored-by: hongyi-zhao <hongyi.zhao@gmail.com>
Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com>

* 接入mathpix ocr功能 (#1468)

* Update Latex输出PDF结果.py

借助mathpix实现了PDF翻译中文并重新编译PDF

* Update config.py

add mathpix appid & appkey

* Add 'PDF翻译中文并重新编译PDF' feature to plugins.

---------

Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com>

* fix zhipuai

* check picture

* remove glm-4 due to bug

* 修改config

* 检查MATHPIX_APPID

* Remove unnecessary code and update
function_plugins dictionary

* capture non-standard token overflow

* bug fix #1524

* change mermaid style

* 支持mermaid 滚动放大缩小重置,鼠标滚动和拖拽 (#1530)

* 支持mermaid 滚动放大缩小重置,鼠标滚动和拖拽

* 微调未果 先stage一下

* update

---------

Co-authored-by: binary-husky <qingxu.fu@outlook.com>
Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com>

* ver 3.72

* change live2d

* save the status of ``clear btn` in cookie

* 前端选择保持

* js ui bug fix

* reset btn bug fix

* update live2d tips

* fix missing get_token_num method

* fix live2d toggle switch

* fix persistent custom btn with cookie

* fix zhipuai feedback with core functionality

* Refactor button update and clean up functions

---------

Co-authored-by: XIao <46100050+Kilig947@users.noreply.github.com>
Co-authored-by: Menghuan1918 <menghuan2003@outlook.com>
Co-authored-by: hongyi-zhao <hongyi.zhao@gmail.com>
Co-authored-by: Hao Ma <893017927@qq.com>
Co-authored-by: zeyuan huang <599012428@qq.com>
2024-02-14 18:35:09 +08:00
binary-husky
e0c5859cf9 update Column min_width parameter 2024-02-12 23:37:31 +08:00
binary-husky
b9b1e12dc9 fix missing get_token_num method 2024-02-12 15:58:55 +08:00
binary-husky
8814026ec3 fix gradio-client version (#1548) 2024-02-09 13:25:01 +08:00
binary-husky
3025d5be45 remove jsdelivr (#1547) 2024-02-09 13:17:14 +08:00
binary-husky
6c13bb7b46 patch issue #1538 2024-02-06 17:59:09 +08:00
binary-husky
c27e559f10 match sess-* key 2024-02-06 17:51:47 +08:00
binary-husky
cdb5288f49 fix issue #1532 2024-02-02 17:47:35 +08:00
hongyi-zhao
49c6fcfe97 Update config.py: supports gpt-4-turbo-preview (#1516)
* Update config.py: supports gpt-4-turbo-preview

supports gpt-4-turbo-preview

* Update config.py

---------

Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com>
2024-01-26 16:44:32 +08:00
hongyi-zhao
45fa0404eb Update bridge_all.py: supports gpt-4-turbo-preview (#1517)
* Update bridge_all.py: supports gpt-4-turbo-preview

supports gpt-4-turbo-preview

* Update bridge_all.py

---------

Co-authored-by: binary-husky <96192199+binary-husky@users.noreply.github.com>
2024-01-26 16:36:23 +08:00
共有 234 个文件被更改,包括 16447 次插入12531 次删除

查看文件

@@ -1,44 +0,0 @@
# https://docs.github.com/en/actions/publishing-packages/publishing-docker-images#publishing-images-to-github-packages
name: build-with-jittorllms
on:
push:
branches:
- 'master'
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}_jittorllms
jobs:
build-and-push-image:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Log in to the Container registry
uses: docker/login-action@v2
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v4
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
- name: Build and push Docker image
uses: docker/build-push-action@v4
with:
context: .
push: true
file: docs/GithubAction+JittorLLMs
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}

查看文件

@@ -1,14 +1,14 @@
# https://docs.github.com/en/actions/publishing-packages/publishing-docker-images#publishing-images-to-github-packages
name: build-with-all-capacity-beta
name: build-with-latex-arm
on:
push:
branches:
- 'master'
- "master"
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}_with_all_capacity_beta
IMAGE_NAME: ${{ github.repository }}_with_latex_arm
jobs:
build-and-push-image:
@@ -18,11 +18,17 @@ jobs:
packages: write
steps:
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Checkout repository
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: Log in to the Container registry
uses: docker/login-action@v2
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
@@ -35,10 +41,11 @@ jobs:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
- name: Build and push Docker image
uses: docker/build-push-action@v4
uses: docker/build-push-action@v6
with:
context: .
push: true
file: docs/GithubAction+AllCapacityBeta
platforms: linux/arm64
file: docs/GithubAction+NoLocal+Latex
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}

10
.gitignore vendored
查看文件

@@ -131,6 +131,9 @@ dmypy.json
# Pyre type checker
.pyre/
# macOS files
.DS_Store
.vscode
.idea
@@ -153,3 +156,10 @@ media
flagged
request_llms/ChatGLM-6b-onnx-u8s8
.pre-commit-config.yaml
test.*
temp.*
objdump*
*.min.*.js
TODO
experimental_mods
search_results

查看文件

@@ -12,11 +12,16 @@ RUN echo '[global]' > /etc/pip.conf && \
echo 'trusted-host = mirrors.aliyun.com' >> /etc/pip.conf
# 语音输出功能以下两行,第一行更换阿里源,第二行安装ffmpeg,都可以删除
RUN UBUNTU_VERSION=$(awk -F= '/^VERSION_CODENAME=/{print $2}' /etc/os-release); echo "deb https://mirrors.aliyun.com/debian/ $UBUNTU_VERSION main non-free contrib" > /etc/apt/sources.list; apt-get update
RUN apt-get install ffmpeg -y
# 进入工作路径(必要)
WORKDIR /gpt
# 安装大部分依赖,利用Docker缓存加速以后的构建 (以下行,可以删除)
# 安装大部分依赖,利用Docker缓存加速以后的构建 (以下行,可以删除)
COPY requirements.txt ./
RUN pip3 install -r requirements.txt

查看文件

@@ -1,7 +1,8 @@
> [!IMPORTANT]
> 2024.1.18: 更新3.70版本,支持Mermaid绘图库让大模型绘制脑图
> 2024.1.17: 恭迎GLM4,全力支持Qwen、GLM、DeepseekCoder等国内中文大语言基座模型
> 2024.1.17: 某些依赖包尚不兼容python 3.12,推荐python 3.11。
> 2024.10.10: 突发停电,紧急恢复了提供[whl包](https://drive.google.com/file/d/19U_hsLoMrjOlQSzYS3pzWX9fTzyusArP/view?usp=sharing)的文件服务器
> 2024.10.8: 版本3.90加入对llama-index的初步支持,版本3.80加入插件二级菜单功能详见wiki
> 2024.5.1: 加入Doc2x翻译PDF论文的功能,[查看详情](https://github.com/binary-husky/gpt_academic/wiki/Doc2x)
> 2024.3.11: 全力支持Qwen、GLM、DeepseekCoder等中文大语言模型 SoVits语音克隆模块,[查看详情](https://www.bilibili.com/video/BV1Rp421S7tF/)
> 2024.1.17: 安装依赖时,请选择`requirements.txt`中**指定的版本**。 安装命令:`pip install -r requirements.txt`。本项目完全开源免费,您可通过订阅[在线服务](https://github.com/binary-husky/gpt_academic/wiki/online)的方式鼓励本项目的发展。
<br>
@@ -67,7 +68,7 @@ Read this in [English](docs/README.English.md) | [日本語](docs/README.Japanes
读论文、[翻译](https://www.bilibili.com/video/BV1KT411x7Wn)论文 | [插件] 一键解读latex/pdf论文全文并生成摘要
Latex全文[翻译](https://www.bilibili.com/video/BV1nk4y1Y7Js/)、[润色](https://www.bilibili.com/video/BV1FT411H7c5/) | [插件] 一键翻译或润色latex论文
批量注释生成 | [插件] 一键批量生成函数注释
Markdown[中英互译](https://www.bilibili.com/video/BV1yo4y157jV/) | [插件] 看到上面5种语言的[README](https://github.com/binary-husky/gpt_academic/blob/master/docs/README_EN.md)了吗?就是出自他的手笔
Markdown[中英互译](https://www.bilibili.com/video/BV1yo4y157jV/) | [插件] 看到上面5种语言的[README](https://github.com/binary-husky/gpt_academic/blob/master/docs/README.English.md)了吗?就是出自他的手笔
[PDF论文全文翻译功能](https://www.bilibili.com/video/BV1KT411x7Wn) | [插件] PDF论文提取题目&摘要+翻译全文(多线程)
[Arxiv小助手](https://www.bilibili.com/video/BV1LM4y1279X) | [插件] 输入arxiv文章url即可一键翻译摘要+下载PDF
Latex论文一键校对 | [插件] 仿Grammarly对Latex文章进行语法、拼写纠错+输出对照PDF
@@ -87,6 +88,10 @@ Latex论文一键校对 | [插件] 仿Grammarly对Latex文章进行语法、拼
<img src="https://user-images.githubusercontent.com/96192199/279702205-d81137c3-affd-4cd1-bb5e-b15610389762.gif" width="700" >
</div>
<div align="center">
<img src="https://github.com/binary-husky/gpt_academic/assets/96192199/70ff1ec5-e589-4561-a29e-b831079b37fb.gif" width="700" >
</div>
- 所有按钮都通过读取functional.py动态生成,可随意加自定义功能,解放剪贴板
<div align="center">
@@ -253,8 +258,7 @@ P.S. 如果需要依赖Latex的插件功能,请见Wiki。另外,您也可以
# Advanced Usage
### I自定义新的便捷按钮学术快捷键
任意文本编辑器打开`core_functional.py`,添加如下条目,然后重启程序。(如果按钮已存在,那么可以直接修改(前缀、后缀都已支持热修改),无需重启程序即可生效。)
例如
现在已可以通过UI中的`界面外观`菜单中的`自定义菜单`添加新的便捷按钮。如果需要在代码中定义,请使用任意文本编辑器打开`core_functional.py`,添加如下条目即可:
```python
"超级英译中": {

查看文件

@@ -1,37 +1,77 @@
from loguru import logger
def check_proxy(proxies):
def check_proxy(proxies, return_ip=False):
"""
检查代理配置并返回结果。
Args:
proxies (dict): 包含http和https代理配置的字典。
return_ip (bool, optional): 是否返回代理的IP地址。默认为False。
Returns:
str or None: 检查的结果信息或代理的IP地址如果`return_ip`为True
"""
import requests
proxies_https = proxies['https'] if proxies is not None else ''
ip = None
try:
response = requests.get("https://ipapi.co/json/", proxies=proxies, timeout=4)
response = requests.get("https://ipapi.co/json/", proxies=proxies, timeout=4) # ⭐ 执行GET请求以获取代理信息
data = response.json()
if 'country_name' in data:
country = data['country_name']
result = f"代理配置 {proxies_https}, 代理所在地:{country}"
if 'ip' in data:
ip = data['ip']
elif 'error' in data:
alternative = _check_with_backup_source(proxies)
alternative, ip = _check_with_backup_source(proxies) # ⭐ 调用备用方法检查代理配置
if alternative is None:
result = f"代理配置 {proxies_https}, 代理所在地未知,IP查询频率受限"
else:
result = f"代理配置 {proxies_https}, 代理所在地:{alternative}"
else:
result = f"代理配置 {proxies_https}, 代理数据解析失败:{data}"
print(result)
if not return_ip:
logger.warning(result)
return result
else:
return ip
except:
result = f"代理配置 {proxies_https}, 代理所在地查询超时,代理可能无效"
print(result)
if not return_ip:
logger.warning(result)
return result
else:
return ip
def _check_with_backup_source(proxies):
"""
通过备份源检查代理,并获取相应信息。
Args:
proxies (dict): 包含代理信息的字典。
Returns:
tuple: 代理信息(geo)和IP地址(ip)的元组。
"""
import random, string, requests
random_string = ''.join(random.choices(string.ascii_letters + string.digits, k=32))
try: return requests.get(f"http://{random_string}.edns.ip-api.com/json", proxies=proxies, timeout=4).json()['dns']['geo']
except: return None
try:
res_json = requests.get(f"http://{random_string}.edns.ip-api.com/json", proxies=proxies, timeout=4).json() # ⭐ 执行代理检查和备份源请求
return res_json['dns']['geo'], res_json['dns']['ip']
except:
return None, None
def backup_and_download(current_version, remote_version):
"""
一键更新协议:备份和下载
一键更新协议:备份当前版本,下载远程版本并解压缩。
Args:
current_version (str): 当前版本号。
remote_version (str): 远程版本号。
Returns:
str: 新版本目录的路径。
"""
from toolbox import get_conf
import shutil
@@ -47,8 +87,8 @@ def backup_and_download(current_version, remote_version):
shutil.copytree('./', backup_dir, ignore=lambda x, y: ['history'])
proxies = get_conf('proxies')
try: r = requests.get('https://github.com/binary-husky/chatgpt_academic/archive/refs/heads/master.zip', proxies=proxies, stream=True)
except: r = requests.get('https://public.gpt-academic.top/publish/master.zip', proxies=proxies, stream=True)
zip_file_path = backup_dir+'/master.zip'
except: r = requests.get('https://public.agent-matrix.com/publish/master.zip', proxies=proxies, stream=True)
zip_file_path = backup_dir+'/master.zip' # ⭐ 保存备份文件的路径
with open(zip_file_path, 'wb+') as f:
f.write(r.content)
dst_path = new_version_dir
@@ -64,6 +104,17 @@ def backup_and_download(current_version, remote_version):
def patch_and_restart(path):
"""
一键更新协议:覆盖和重启
Args:
path (str): 新版本代码所在的路径
注意事项:
如果您的程序没有使用config_private.py私密配置文件,则会将config.py重命名为config_private.py以避免配置丢失。
更新流程:
- 复制最新版本代码到当前目录
- 更新pip包依赖
- 如果更新失败,则提示手动安装依赖库并重启
"""
from distutils import dir_util
import shutil
@@ -71,33 +122,44 @@ def patch_and_restart(path):
import sys
import time
import glob
from colorful import print亮黄, print亮绿, print亮红
# if not using config_private, move origin config.py as config_private.py
from shared_utils.colorful import log亮黄, log亮绿, log亮红
if not os.path.exists('config_private.py'):
print亮黄('由于您没有设置config_private.py私密配置,现将您的现有配置移动至config_private.py以防止配置丢失,',
log亮黄('由于您没有设置config_private.py私密配置,现将您的现有配置移动至config_private.py以防止配置丢失,',
'另外您可以随时在history子文件夹下找回旧版的程序。')
shutil.copyfile('config.py', 'config_private.py')
path_new_version = glob.glob(path + '/*-master')[0]
dir_util.copy_tree(path_new_version, './')
print亮绿('代码已经更新,即将更新pip包依赖……')
for i in reversed(range(5)): time.sleep(1); print(i)
dir_util.copy_tree(path_new_version, './') # ⭐ 将最新版本代码复制到当前目录
log亮绿('代码已经更新,即将更新pip包依赖……')
for i in reversed(range(5)): time.sleep(1); log亮绿(i)
try:
import subprocess
subprocess.check_call([sys.executable, '-m', 'pip', 'install', '-r', 'requirements.txt'])
except:
print亮红('pip包依赖安装出现问题,需要手动安装新增的依赖库 `python -m pip install -r requirements.txt`,然后在用常规的`python main.py`的方式启动。')
print亮绿('更新完成,您可以随时在history子文件夹下找回旧版的程序,5s之后重启')
print亮红('假如重启失败,您可能需要手动安装新增的依赖库 `python -m pip install -r requirements.txt`,然后在用常规的`python main.py`的方式启动。')
print(' ------------------------------ -----------------------------------')
for i in reversed(range(8)): time.sleep(1); print(i)
os.execl(sys.executable, sys.executable, *sys.argv)
log亮红('pip包依赖安装出现问题,需要手动安装新增的依赖库 `python -m pip install -r requirements.txt`,然后在用常规的`python main.py`的方式启动。')
log亮绿('更新完成,您可以随时在history子文件夹下找回旧版的程序,5s之后重启')
log亮红('假如重启失败,您可能需要手动安装新增的依赖库 `python -m pip install -r requirements.txt`,然后在用常规的`python main.py`的方式启动。')
log亮绿(' ------------------------------ -----------------------------------')
for i in reversed(range(8)): time.sleep(1); log亮绿(i)
os.execl(sys.executable, sys.executable, *sys.argv) # 重启程序
def get_current_version():
"""
获取当前的版本号。
Returns:
str: 当前的版本号。如果无法获取版本号,则返回空字符串。
"""
import json
try:
with open('./version', 'r', encoding='utf8') as f:
current_version = json.loads(f.read())['version']
current_version = json.loads(f.read())['version'] # ⭐ 从读取的json数据中提取版本号
except:
current_version = ""
return current_version
@@ -106,6 +168,12 @@ def get_current_version():
def auto_update(raise_error=False):
"""
一键更新协议:查询版本和用户意见
Args:
raise_error (bool, optional): 是否在出错时抛出错误。默认为 False。
Returns:
None
"""
try:
from toolbox import get_conf
@@ -113,7 +181,7 @@ def auto_update(raise_error=False):
import json
proxies = get_conf('proxies')
try: response = requests.get("https://raw.githubusercontent.com/binary-husky/chatgpt_academic/master/version", proxies=proxies, timeout=5)
except: response = requests.get("https://public.gpt-academic.top/publish/version", proxies=proxies, timeout=5)
except: response = requests.get("https://public.agent-matrix.com/publish/version", proxies=proxies, timeout=5)
remote_json_data = json.loads(response.text)
remote_version = remote_json_data['version']
if remote_json_data["show_feature"]:
@@ -124,22 +192,22 @@ def auto_update(raise_error=False):
current_version = f.read()
current_version = json.loads(current_version)['version']
if (remote_version - current_version) >= 0.01-1e-5:
from colorful import print亮黄
print亮黄(f'\n新版本可用。新版本:{remote_version},当前版本:{current_version}{new_feature}')
print('1Github更新地址:\nhttps://github.com/binary-husky/chatgpt_academic\n')
from shared_utils.colorful import log亮黄
log亮黄(f'\n新版本可用。新版本:{remote_version},当前版本:{current_version}{new_feature}') # ⭐ 在控制台打印新版本信息
logger.info('1Github更新地址:\nhttps://github.com/binary-husky/chatgpt_academic\n')
user_instruction = input('2是否一键更新代码Y+回车=确认,输入其他/无输入+回车=不更新)?')
if user_instruction in ['Y', 'y']:
path = backup_and_download(current_version, remote_version)
path = backup_and_download(current_version, remote_version) # ⭐ 备份并下载文件
try:
patch_and_restart(path)
patch_and_restart(path) # ⭐ 执行覆盖并重启操作
except:
msg = '更新失败。'
if raise_error:
from toolbox import trimmed_format_exc
msg += trimmed_format_exc()
print(msg)
logger.warning(msg)
else:
print('自动更新程序:已禁用')
logger.info('自动更新程序:已禁用')
return
else:
return
@@ -148,10 +216,13 @@ def auto_update(raise_error=False):
if raise_error:
from toolbox import trimmed_format_exc
msg += trimmed_format_exc()
print(msg)
logger.info(msg)
def warm_up_modules():
print('正在执行一些模块的预热 ...')
"""
预热模块,加载特定模块并执行预热操作。
"""
logger.info('正在执行一些模块的预热 ...')
from toolbox import ProxyNetworkActivate
from request_llms.bridge_all import model_info
with ProxyNetworkActivate("Warmup_Modules"):
@@ -161,7 +232,17 @@ def warm_up_modules():
enc.encode("模块预热", disallowed_special=())
def warm_up_vectordb():
print('正在执行一些模块的预热 ...')
"""
执行一些模块的预热操作。
本函数主要用于执行一些模块的预热操作,确保在后续的流程中能够顺利运行。
⭐ 关键作用:预热模块
Returns:
None
"""
logger.info('正在执行一些模块的预热 ...')
from toolbox import ProxyNetworkActivate
with ProxyNetworkActivate("Warmup_Modules"):
import nltk

159
config.py
查看文件

@@ -30,11 +30,44 @@ if USE_PROXY:
else:
proxies = None
# ------------------------------------ 以下配置可以优化体验, 但大部分场合下并不需要修改 ------------------------------------
# [step 3]>> 模型选择是 (注意: LLM_MODEL是默认选中的模型, 它*必须*被包含在AVAIL_LLM_MODELS列表中 )
LLM_MODEL = "gpt-3.5-turbo-16k" # 可选 ↓↓↓
AVAIL_LLM_MODELS = ["gpt-4-1106-preview", "gpt-4-turbo-preview", "gpt-4-vision-preview",
"gpt-4o", "gpt-4o-mini", "gpt-4-turbo", "gpt-4-turbo-2024-04-09",
"gpt-3.5-turbo-1106", "gpt-3.5-turbo-16k", "gpt-3.5-turbo", "azure-gpt-3.5",
"gpt-4", "gpt-4-32k", "azure-gpt-4", "glm-4", "glm-4v", "glm-3-turbo",
"gemini-1.5-pro", "chatglm3"
]
EMBEDDING_MODEL = "text-embedding-3-small"
# --- --- --- ---
# P.S. 其他可用的模型还包括
# AVAIL_LLM_MODELS = [
# "glm-4-0520", "glm-4-air", "glm-4-airx", "glm-4-flash",
# "qianfan", "deepseekcoder",
# "spark", "sparkv2", "sparkv3", "sparkv3.5", "sparkv4",
# "qwen-turbo", "qwen-plus", "qwen-max", "qwen-local",
# "moonshot-v1-128k", "moonshot-v1-32k", "moonshot-v1-8k",
# "gpt-3.5-turbo-0613", "gpt-3.5-turbo-16k-0613", "gpt-3.5-turbo-0125", "gpt-4o-2024-05-13"
# "claude-3-haiku-20240307","claude-3-sonnet-20240229","claude-3-opus-20240229", "claude-2.1", "claude-instant-1.2",
# "moss", "llama2", "chatglm_onnx", "internlm", "jittorllms_pangualpha", "jittorllms_llama",
# "deepseek-chat" ,"deepseek-coder",
# "gemini-1.5-flash",
# "yi-34b-chat-0205","yi-34b-chat-200k","yi-large","yi-medium","yi-spark","yi-large-turbo","yi-large-preview",
# ]
# --- --- --- ---
# 此外,您还可以在接入one-api/vllm/ollama/Openroute时,
# 使用"one-api-*","vllm-*","ollama-*","openrouter-*"前缀直接使用非标准方式接入的模型,例如
# AVAIL_LLM_MODELS = ["one-api-claude-3-sonnet-20240229(max_token=100000)", "ollama-phi3(max_token=4096)","openrouter-openai/gpt-4o-mini","openrouter-openai/chatgpt-4o-latest"]
# --- --- --- ---
# --------------- 以下配置可以优化体验 ---------------
# 重新URL重新定向,实现更换API_URL的作用高危设置! 常规情况下不要修改! 通过修改此设置,您将把您的API-KEY和对话隐私完全暴露给您设定的中间人
# 格式: API_URL_REDIRECT = {"https://api.openai.com/v1/chat/completions": "在这里填写重定向的api.openai.com的URL"}
# 举例: API_URL_REDIRECT = {"https://api.openai.com/v1/chat/completions": "https://reverse-proxy-url/v1/chat/completions"}
# 举例: API_URL_REDIRECT = {"https://api.openai.com/v1/chat/completions": "https://reverse-proxy-url/v1/chat/completions", "http://localhost:11434/api/chat": "在这里填写您ollama的URL"}
API_URL_REDIRECT = {}
@@ -77,6 +110,10 @@ TIMEOUT_SECONDS = 30
WEB_PORT = -1
# 是否自动打开浏览器页面
AUTO_OPEN_BROWSER = True
# 如果OpenAI不响应网络卡顿、代理失败、KEY失效,重试的次数限制
MAX_RETRY = 2
@@ -85,20 +122,6 @@ MAX_RETRY = 2
DEFAULT_FN_GROUPS = ['对话', '编程', '学术', '智能体']
# 模型选择是 (注意: LLM_MODEL是默认选中的模型, 它*必须*被包含在AVAIL_LLM_MODELS列表中 )
LLM_MODEL = "gpt-3.5-turbo" # 可选 ↓↓↓
AVAIL_LLM_MODELS = ["gpt-3.5-turbo-1106","gpt-4-1106-preview","gpt-4-vision-preview",
"gpt-3.5-turbo-16k", "gpt-3.5-turbo", "azure-gpt-3.5",
"gpt-4", "gpt-4-32k", "azure-gpt-4", "api2d-gpt-4",
"gemini-pro", "chatglm3", "claude-2", "zhipuai"]
# P.S. 其他可用的模型还包括 [
# "moss", "qwen-turbo", "qwen-plus", "qwen-max"
# "zhipuai", "qianfan", "deepseekcoder", "llama2", "qwen-local", "gpt-3.5-turbo-0613",
# "gpt-3.5-turbo-16k-0613", "gpt-3.5-random", "api2d-gpt-3.5-turbo", 'api2d-gpt-3.5-turbo-16k',
# "spark", "sparkv2", "sparkv3", "chatglm_onnx", "claude-1-100k", "claude-2", "internlm", "jittorllms_pangualpha", "jittorllms_llama"
# ]
# 定义界面上“询问多个GPT模型”插件应该使用哪些模型,请从AVAIL_LLM_MODELS中选择,并在不同模型之间用`&`间隔,例如"gpt-3.5-turbo&chatglm3&azure-gpt-4"
MULTI_QUERY_LLM_MODELS = "gpt-3.5-turbo&chatglm3"
@@ -116,7 +139,7 @@ DASHSCOPE_API_KEY = "" # 阿里灵积云API_KEY
# 百度千帆LLM_MODEL="qianfan"
BAIDU_CLOUD_API_KEY = ''
BAIDU_CLOUD_SECRET_KEY = ''
BAIDU_CLOUD_QIANFAN_MODEL = 'ERNIE-Bot' # 可选 "ERNIE-Bot-4"(文心大模型4.0), "ERNIE-Bot"(文心一言), "ERNIE-Bot-turbo", "BLOOMZ-7B", "Llama-2-70B-Chat", "Llama-2-13B-Chat", "Llama-2-7B-Chat"
BAIDU_CLOUD_QIANFAN_MODEL = 'ERNIE-Bot' # 可选 "ERNIE-Bot-4"(文心大模型4.0), "ERNIE-Bot"(文心一言), "ERNIE-Bot-turbo", "BLOOMZ-7B", "Llama-2-70B-Chat", "Llama-2-13B-Chat", "Llama-2-7B-Chat", "ERNIE-Speed-128K", "ERNIE-Speed-8K", "ERNIE-Lite-8K"
# 如果使用ChatGLM2微调模型,请把 LLM_MODEL="chatglmft",并在此处指定模型路径
@@ -127,6 +150,7 @@ CHATGLM_PTUNING_CHECKPOINT = "" # 例如"/home/hmp/ChatGLM2-6B/ptuning/output/6b
LOCAL_MODEL_DEVICE = "cpu" # 可选 "cuda"
LOCAL_MODEL_QUANT = "FP16" # 默认 "FP16" "INT4" 启用量化INT4版本 "INT8" 启用量化INT8版本
# 设置gradio的并行线程数不需要修改
CONCURRENT_COUNT = 100
@@ -144,7 +168,8 @@ ADD_WAIFU = False
AUTHENTICATION = []
# 如果需要在二级路径下运行(常规情况下,不要修改!!需要配合修改main.py才能生效!
# 如果需要在二级路径下运行(常规情况下,不要修改!!
# (举例 CUSTOM_PATH = "/gpt_academic",可以让软件运行在 http://ip:port/gpt_academic/ 下。)
CUSTOM_PATH = "/"
@@ -172,14 +197,8 @@ AZURE_ENGINE = "填入你亲手写的部署名" # 读 docs\use_azure.
AZURE_CFG_ARRAY = {}
# 使用Newbing (不推荐使用,未来将删除)
NEWBING_STYLE = "creative" # ["creative", "balanced", "precise"]
NEWBING_COOKIES = """
put your new bing cookies here
"""
# 阿里云实时语音识别 配置难度较高 仅建议高手用户使用 参考 https://github.com/binary-husky/gpt_academic/blob/master/docs/use_audio.md
# 阿里云实时语音识别 配置难度较高
# 参考 https://github.com/binary-husky/gpt_academic/blob/master/docs/use_audio.md
ENABLE_AUDIO = False
ALIYUN_TOKEN="" # 例如 f37f30e0f9934c34a992f6f64f7eba4f
ALIYUN_APPKEY="" # 例如 RoPlZrM88DnAFkZK
@@ -187,6 +206,12 @@ ALIYUN_ACCESSKEY="" # (无需填写)
ALIYUN_SECRET="" # (无需填写)
# GPT-SOVITS 文本转语音服务的运行地址(将语言模型的生成文本朗读出来)
TTS_TYPE = "EDGE_TTS" # EDGE_TTS / LOCAL_SOVITS_API / DISABLE
GPT_SOVITS_URL = ""
EDGE_TTS_VOICE = "zh-CN-XiaoxiaoNeural"
# 接入讯飞星火大模型 https://console.xfyun.cn/services/iat
XFYUN_APPID = "00000000"
XFYUN_API_SECRET = "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
@@ -195,19 +220,38 @@ XFYUN_API_KEY = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
# 接入智谱大模型
ZHIPUAI_API_KEY = ""
ZHIPUAI_MODEL = "glm-4" # 可选 "glm-3-turbo" "glm-4"
# # 火山引擎YUNQUE大模型
# YUNQUE_SECRET_KEY = ""
# YUNQUE_ACCESS_KEY = ""
# YUNQUE_MODEL = ""
ZHIPUAI_MODEL = "" # 此选项已废弃,不再需要填写
# Claude API KEY
ANTHROPIC_API_KEY = ""
# 月之暗面 API KEY
MOONSHOT_API_KEY = ""
# 零一万物(Yi Model) API KEY
YIMODEL_API_KEY = ""
# 深度求索(DeepSeek) API KEY,默认请求地址为"https://api.deepseek.com/v1/chat/completions"
DEEPSEEK_API_KEY = ""
# 紫东太初大模型 https://ai-maas.wair.ac.cn
TAICHU_API_KEY = ""
# Mathpix 拥有执行PDF的OCR功能,但是需要注册账号
MATHPIX_APPID = ""
MATHPIX_APPKEY = ""
# DOC2X的PDF解析服务,注册账号并获取API KEY: https://doc2x.noedgeai.com/login
DOC2X_API_KEY = ""
# 自定义API KEY格式
CUSTOM_API_KEY_PATTERN = ""
@@ -229,6 +273,10 @@ GROBID_URLS = [
]
# Searxng互联网检索服务
SEARXNG_URL = "https://cloud-1.agent-matrix.com/"
# 是否允许通过自然语言描述修改本页的配置,该功能具有一定的危险性,默认关闭
ALLOW_RESET_CONFIG = False
@@ -237,21 +285,21 @@ ALLOW_RESET_CONFIG = False
AUTOGEN_USE_DOCKER = False
# 临时的上传文件夹位置,请修改
# 临时的上传文件夹位置,请尽量不要修改
PATH_PRIVATE_UPLOAD = "private_upload"
# 日志文件夹的位置,请修改
# 日志文件夹的位置,请尽量不要修改
PATH_LOGGING = "gpt_log"
# 除了连接OpenAI之外,还有哪些场合允许使用代理,请勿修改
# 存储翻译好的arxiv论文的路径,请尽量不要修改
ARXIV_CACHE_DIR = "gpt_log/arxiv_cache"
# 除了连接OpenAI之外,还有哪些场合允许使用代理,请尽量不要修改
WHEN_TO_USE_PROXY = ["Download_LLM", "Download_Gradio_Theme", "Connect_Grobid",
"Warmup_Modules", "Nougat_Download", "AutoGen"]
# *实验性功能*: 自动检测并屏蔽失效的KEY,请勿使用
BLOCK_INVALID_APIKEY = False
"Warmup_Modules", "Nougat_Download", "AutoGen", "Connect_OpenAI_Embedding"]
# 启用插件热加载
@@ -261,7 +309,11 @@ PLUGIN_HOT_RELOAD = False
# 自定义按钮的最大数量限制
NUM_CUSTOM_BASIC_BTN = 4
"""
--------------- 配置关联关系说明 ---------------
在线大模型配置关联关系示意图
├── "gpt-3.5-turbo" 等openai模型
@@ -285,7 +337,7 @@ NUM_CUSTOM_BASIC_BTN = 4
│ ├── XFYUN_API_SECRET
│ └── XFYUN_API_KEY
├── "claude-1-100k" 等claude模型
├── "claude-3-opus-20240229" 等claude模型
│ └── ANTHROPIC_API_KEY
├── "stack-claude"
@@ -297,9 +349,11 @@ NUM_CUSTOM_BASIC_BTN = 4
│ ├── BAIDU_CLOUD_API_KEY
│ └── BAIDU_CLOUD_SECRET_KEY
├── "zhipuai" 智谱AI大模型chatglm_turbo
── ZHIPUAI_API_KEY
└── ZHIPUAI_MODEL
├── "glm-4", "glm-3-turbo", "zhipuai" 智谱AI大模型
── ZHIPUAI_API_KEY
├── "yi-34b-chat-0205", "yi-34b-chat-200k" 等零一万物(Yi Model)大模型
│ └── YIMODEL_API_KEY
├── "qwen-turbo" 等通义千问大模型
│ └── DASHSCOPE_API_KEY
@@ -307,9 +361,10 @@ NUM_CUSTOM_BASIC_BTN = 4
├── "Gemini"
│ └── GEMINI_API_KEY
└── "newbing" Newbing接口不再稳定,不推荐使用
├── NEWBING_STYLE
── NEWBING_COOKIES
└── "one-api-...(max_token=...)" 用一种更方便的方式接入one-api多模型管理界面
├── AVAIL_LLM_MODELS
── API_KEY
└── API_URL_REDIRECT
本地大模型示意图
@@ -343,6 +398,9 @@ NUM_CUSTOM_BASIC_BTN = 4
插件在线服务配置依赖关系示意图
├── 互联网检索
│ └── SEARXNG_URL
├── 语音功能
│ ├── ENABLE_AUDIO
│ ├── ALIYUN_TOKEN
@@ -351,6 +409,9 @@ NUM_CUSTOM_BASIC_BTN = 4
│ └── ALIYUN_SECRET
└── PDF文档精准解析
── GROBID_URLS
── GROBID_URLS
├── MATHPIX_APPID
└── MATHPIX_APPKEY
"""

查看文件

@@ -17,7 +17,7 @@ def get_core_functions():
text_show_english=
r"Below is a paragraph from an academic paper. Polish the writing to meet the academic style, "
r"improve the spelling, grammar, clarity, concision and overall readability. When necessary, rewrite the whole sentence. "
r"Firstly, you should provide the polished paragraph. "
r"Firstly, you should provide the polished paragraph (in English). "
r"Secondly, you should list all your modification and explain the reasons to do so in markdown table.",
text_show_chinese=
r"作为一名中文学术论文写作改进助理,你的任务是改进所提供文本的拼写、语法、清晰、简洁和整体可读性,"
@@ -33,17 +33,19 @@ def get_core_functions():
"AutoClearHistory": False,
# [6] 文本预处理 (可选参数,默认 None,举例写个函数移除所有的换行符
"PreProcess": None,
# [7] 模型选择 (可选参数。如不设置,则使用当前全局模型;如设置,则用指定模型覆盖全局模型。)
# "ModelOverride": "gpt-3.5-turbo", # 主要用途:强制点击此基础功能按钮时,使用指定的模型。
},
"总结绘制脑图": {
# 前缀,会被加在你的输入之前。例如,用来描述你的要求,例如翻译、解释代码、润色等等
"Prefix": r"",
"Prefix": '''"""\n\n''',
# 后缀,会被加在你的输入之后。例如,配合前缀可以把你的输入内容用引号圈起来
"Suffix":
# dedent() 函数用于去除多行字符串的缩进
dedent("\n"+r'''
==============================
dedent("\n\n"+r'''
"""
使用mermaid flowchart对以上文本进行总结,概括上述段落的内容以及内在逻辑关系,例如
@@ -57,7 +59,7 @@ def get_core_functions():
C --> |"箭头名2"| F["节点名6"]
```
警告
注意
1使用中文
2节点名字使用引号包裹,如["Laptop"]
3`|` 和 `"`之间不要存在空格

查看文件

@@ -1,46 +1,62 @@
from toolbox import HotReload # HotReload 的意思是热更新,修改函数插件后,不需要重启程序,代码直接生效
from toolbox import trimmed_format_exc
from loguru import logger
def get_crazy_functions():
from crazy_functions.读文章写摘要 import 读文章写摘要
from crazy_functions.生成函数注释 import 批量生成函数注释
from crazy_functions.解析项目源代码 import 解析项目本身
from crazy_functions.解析项目源代码 import 解析一个Python项目
from crazy_functions.解析项目源代码 import 解析一个Matlab项目
from crazy_functions.解析项目源代码 import 解析一个C项目的头文件
from crazy_functions.解析项目源代码 import 解析一个C项目
from crazy_functions.解析项目源代码 import 解析一个Golang项目
from crazy_functions.解析项目源代码 import 解析一个Rust项目
from crazy_functions.解析项目源代码 import 解析一个Java项目
from crazy_functions.解析项目源代码 import 解析一个前端项目
from crazy_functions.SourceCode_Analyse import 解析项目本身
from crazy_functions.SourceCode_Analyse import 解析一个Python项目
from crazy_functions.SourceCode_Analyse import 解析一个Matlab项目
from crazy_functions.SourceCode_Analyse import 解析一个C项目的头文件
from crazy_functions.SourceCode_Analyse import 解析一个C项目
from crazy_functions.SourceCode_Analyse import 解析一个Golang项目
from crazy_functions.SourceCode_Analyse import 解析一个Rust项目
from crazy_functions.SourceCode_Analyse import 解析一个Java项目
from crazy_functions.SourceCode_Analyse import 解析一个前端项目
from crazy_functions.高级功能函数模板 import 高阶功能模板函数
from crazy_functions.高级功能函数模板 import Demo_Wrap
from crazy_functions.Latex全文润色 import Latex英文润色
from crazy_functions.询问多个大语言模型 import 同时问询
from crazy_functions.解析项目源代码 import 解析一个Lua项目
from crazy_functions.解析项目源代码 import 解析一个CSharp项目
from crazy_functions.SourceCode_Analyse import 解析一个Lua项目
from crazy_functions.SourceCode_Analyse import 解析一个CSharp项目
from crazy_functions.总结word文档 import 总结word文档
from crazy_functions.解析JupyterNotebook import 解析ipynb文件
from crazy_functions.对话历史存档 import 对话历史存档
from crazy_functions.对话历史存档 import 载入对话历史存档
from crazy_functions.对话历史存档 import 删除所有本地对话历史记录
from crazy_functions.Conversation_To_File import 载入对话历史存档
from crazy_functions.Conversation_To_File import 对话历史存档
from crazy_functions.Conversation_To_File import Conversation_To_File_Wrap
from crazy_functions.Conversation_To_File import 删除所有本地对话历史记录
from crazy_functions.辅助功能 import 清除缓存
from crazy_functions.批量Markdown翻译 import Markdown英译中
from crazy_functions.Markdown_Translate import Markdown英译中
from crazy_functions.批量总结PDF文档 import 批量总结PDF文档
from crazy_functions.批量翻译PDF文档_多线程 import 批量翻译PDF文档
from crazy_functions.PDF_Translate import 批量翻译PDF文档
from crazy_functions.谷歌检索小助手 import 谷歌检索小助手
from crazy_functions.理解PDF文档内容 import 理解PDF文档内容标准文件输入
from crazy_functions.Latex全文润色 import Latex中文润色
from crazy_functions.Latex全文润色 import Latex英文纠错
from crazy_functions.批量Markdown翻译 import Markdown中译英
from crazy_functions.Markdown_Translate import Markdown中译英
from crazy_functions.虚空终端 import 虚空终端
from crazy_functions.生成多种Mermaid图表 import 生成多种Mermaid图表
from crazy_functions.生成多种Mermaid图表 import Mermaid_Gen
from crazy_functions.PDF_Translate_Wrap import PDF_Tran
from crazy_functions.Latex_Function import Latex英文纠错加PDF对比
from crazy_functions.Latex_Function import Latex翻译中文并重新编译PDF
from crazy_functions.Latex_Function import PDF翻译中文并重新编译PDF
from crazy_functions.Latex_Function_Wrap import Arxiv_Localize
from crazy_functions.Latex_Function_Wrap import PDF_Localize
from crazy_functions.Internet_GPT import 连接网络回答问题
from crazy_functions.Internet_GPT_Wrap import NetworkGPT_Wrap
from crazy_functions.Image_Generate import 图片生成_DALLE2, 图片生成_DALLE3, 图片修改_DALLE2
from crazy_functions.Image_Generate_Wrap import ImageGen_Wrap
from crazy_functions.SourceCode_Comment import 注释Python项目
from crazy_functions.SourceCode_Comment_Wrap import SourceCodeComment_Wrap
function_plugins = {
"虚空终端": {
"Group": "对话|编程|学术|智能体",
"Color": "stop",
"AsButton": True,
"Info": "使用自然语言实现您的想法",
"Function": HotReload(虚空终端),
},
"解析整个Python项目": {
@@ -50,6 +66,14 @@ def get_crazy_functions():
"Info": "解析一个Python项目的所有源文件(.py) | 输入参数为路径",
"Function": HotReload(解析一个Python项目),
},
"注释Python项目": {
"Group": "编程",
"Color": "stop",
"AsButton": False,
"Info": "上传一系列python源文件(或者压缩包), 为这些代码添加docstring | 输入参数为路径",
"Function": HotReload(注释Python项目),
"Class": SourceCodeComment_Wrap,
},
"载入对话历史存档(先上传存档或输入路径)": {
"Group": "对话",
"Color": "stop",
@@ -70,19 +94,26 @@ def get_crazy_functions():
"Info": "清除所有缓存文件,谨慎操作 | 不需要输入参数",
"Function": HotReload(清除缓存),
},
"生成多种Mermaid图表(从当前对话或文件(.pdf/.md)中生产图表)": {
"生成多种Mermaid图表(从当前对话或路径(.pdf/.md/.docx)中生产图表)": {
"Group": "对话",
"Color": "stop",
"AsButton": False,
"Info" : "基于当前对话或PDF生成多种Mermaid图表,图表类型由模型判断",
"Function": HotReload(生成多种Mermaid图表),
"AdvancedArgs": True,
"ArgsReminder": "请输入图类型对应的数字,不输入则为模型自行判断:1-流程图,2-序列图,3-类图,4-饼图,5-甘特图,6-状态图,7-实体关系图,8-象限提示图,9-思维导图",
"Info" : "基于当前对话或文件生成多种Mermaid图表,图表类型由模型判断",
"Function": None,
"Class": Mermaid_Gen
},
"Arxiv论文翻译": {
"Group": "学术",
"Color": "stop",
"AsButton": True,
"Info": "Arixv论文精细翻译 | 输入参数arxiv论文的ID,比如1812.10695",
"Function": HotReload(Latex翻译中文并重新编译PDF), # 当注册Class后,Function旧接口仅会在“虚空终端”中起作用
"Class": Arxiv_Localize, # 新一代插件需要注册Class
},
"批量总结Word文档": {
"Group": "学术",
"Color": "stop",
"AsButton": True,
"AsButton": False,
"Info": "批量总结word文档 | 输入参数为路径",
"Function": HotReload(总结word文档),
},
@@ -188,28 +219,42 @@ def get_crazy_functions():
},
"保存当前的对话": {
"Group": "对话",
"Color": "stop",
"AsButton": True,
"Info": "保存当前的对话 | 不需要输入参数",
"Function": HotReload(对话历史存档),
"Function": HotReload(对话历史存档), # 当注册Class后,Function旧接口仅会在“虚空终端”中起作用
"Class": Conversation_To_File_Wrap # 新一代插件需要注册Class
},
"[多线程Demo]解析此项目本身(源码自译解)": {
"Group": "对话|编程",
"Color": "stop",
"AsButton": False, # 加入下拉菜单中
"Info": "多线程解析并翻译此项目的源码 | 不需要输入参数",
"Function": HotReload(解析项目本身),
},
"查互联网后回答": {
"Group": "对话",
"Color": "stop",
"AsButton": True, # 加入下拉菜单中
# "Info": "连接网络回答问题(需要访问谷歌)| 输入参数是一个问题",
"Function": HotReload(连接网络回答问题),
"Class": NetworkGPT_Wrap # 新一代插件需要注册Class
},
"历史上的今天": {
"Group": "对话",
"AsButton": True,
"Color": "stop",
"AsButton": False,
"Info": "查看历史上的今天事件 (这是一个面向开发者的插件Demo) | 不需要输入参数",
"Function": HotReload(高阶功能模板函数),
"Function": None,
"Class": Demo_Wrap, # 新一代插件需要注册Class
},
"精准翻译PDF论文": {
"Group": "学术",
"Color": "stop",
"AsButton": True,
"Info": "精准翻译PDF论文为中文 | 输入参数为路径",
"Function": HotReload(批量翻译PDF文档),
"Function": HotReload(批量翻译PDF文档), # 当注册Class后,Function旧接口仅会在“虚空终端”中起作用
"Class": PDF_Tran, # 新一代插件需要注册Class
},
"询问多个GPT模型": {
"Group": "对话",
@@ -284,7 +329,84 @@ def get_crazy_functions():
"Info": "批量将Markdown文件中文翻译为英文 | 输入参数为路径或上传压缩包",
"Function": HotReload(Markdown中译英),
},
"Latex英文纠错+高亮修正位置 [需Latex]": {
"Group": "学术",
"Color": "stop",
"AsButton": False,
"AdvancedArgs": True,
"ArgsReminder": "如果有必要, 请在此处追加更细致的矫错指令(使用英文)。",
"Function": HotReload(Latex英文纠错加PDF对比),
},
"📚Arxiv论文精细翻译输入arxivID[需Latex]": {
"Group": "学术",
"Color": "stop",
"AsButton": False,
"AdvancedArgs": True,
"ArgsReminder": r"如果有必要, 请在此处给出自定义翻译命令, 解决部分词汇翻译不准确的问题。 "
r"例如当单词'agent'翻译不准确时, 请尝试把以下指令复制到高级参数区: "
r'If the term "agent" is used in this section, it should be translated to "智能体". ',
"Info": "Arixv论文精细翻译 | 输入参数arxiv论文的ID,比如1812.10695",
"Function": HotReload(Latex翻译中文并重新编译PDF), # 当注册Class后,Function旧接口仅会在“虚空终端”中起作用
"Class": Arxiv_Localize, # 新一代插件需要注册Class
},
"📚本地Latex论文精细翻译上传Latex项目[需Latex]": {
"Group": "学术",
"Color": "stop",
"AsButton": False,
"AdvancedArgs": True,
"ArgsReminder": r"如果有必要, 请在此处给出自定义翻译命令, 解决部分词汇翻译不准确的问题。 "
r"例如当单词'agent'翻译不准确时, 请尝试把以下指令复制到高级参数区: "
r'If the term "agent" is used in this section, it should be translated to "智能体". ',
"Info": "本地Latex论文精细翻译 | 输入参数是路径",
"Function": HotReload(Latex翻译中文并重新编译PDF),
},
"PDF翻译中文并重新编译PDF上传PDF[需Latex]": {
"Group": "学术",
"Color": "stop",
"AsButton": False,
"AdvancedArgs": True,
"ArgsReminder": r"如果有必要, 请在此处给出自定义翻译命令, 解决部分词汇翻译不准确的问题。 "
r"例如当单词'agent'翻译不准确时, 请尝试把以下指令复制到高级参数区: "
r'If the term "agent" is used in this section, it should be translated to "智能体". ',
"Info": "PDF翻译中文,并重新编译PDF | 输入参数为路径",
"Function": HotReload(PDF翻译中文并重新编译PDF), # 当注册Class后,Function旧接口仅会在“虚空终端”中起作用
"Class": PDF_Localize # 新一代插件需要注册Class
}
}
function_plugins.update(
{
"🎨图片生成DALLE2/DALLE3, 使用前切换到GPT系列模型": {
"Group": "对话",
"Color": "stop",
"AsButton": False,
"Info": "使用 DALLE2/DALLE3 生成图片 | 输入参数字符串,提供图像的内容",
"Function": HotReload(图片生成_DALLE2), # 当注册Class后,Function旧接口仅会在“虚空终端”中起作用
"Class": ImageGen_Wrap # 新一代插件需要注册Class
},
}
)
function_plugins.update(
{
"🎨图片修改_DALLE2 使用前请切换模型到GPT系列": {
"Group": "对话",
"Color": "stop",
"AsButton": False,
"AdvancedArgs": False, # 调用时,唤起高级参数输入区默认False
# "Info": "使用DALLE2修改图片 | 输入参数字符串,提供图像的内容",
"Function": HotReload(图片修改_DALLE2),
},
}
)
# -=--=- 尚未充分测试的实验性插件 & 需要额外依赖的插件 -=--=-
try:
@@ -302,42 +424,42 @@ def get_crazy_functions():
}
)
except:
print(trimmed_format_exc())
print("Load function plugin failed")
logger.error(trimmed_format_exc())
logger.error("Load function plugin failed")
# try:
# from crazy_functions.联网的ChatGPT import 连接网络回答问题
# function_plugins.update(
# {
# "连接网络回答问题(输入问题后点击该插件,需要访问谷歌)": {
# "Group": "对话",
# "Color": "stop",
# "AsButton": False, # 加入下拉菜单中
# # "Info": "连接网络回答问题(需要访问谷歌)| 输入参数是一个问题",
# "Function": HotReload(连接网络回答问题),
# }
# }
# )
# from crazy_functions.联网的ChatGPT_bing版 import 连接bing搜索回答问题
# function_plugins.update(
# {
# "连接网络回答问题中文Bing版,输入问题后点击该插件": {
# "Group": "对话",
# "Color": "stop",
# "AsButton": False, # 加入下拉菜单中
# "Info": "连接网络回答问题需要访问中文Bing| 输入参数是一个问题",
# "Function": HotReload(连接bing搜索回答问题),
# }
# }
# )
# except:
# logger.error(trimmed_format_exc())
# logger.error("Load function plugin failed")
try:
from crazy_functions.联网的ChatGPT import 连接网络回答问题
function_plugins.update(
{
"连接网络回答问题(输入问题后点击该插件,需要访问谷歌)": {
"Group": "对话",
"Color": "stop",
"AsButton": False, # 加入下拉菜单中
# "Info": "连接网络回答问题(需要访问谷歌)| 输入参数是一个问题",
"Function": HotReload(连接网络回答问题),
}
}
)
from crazy_functions.联网的ChatGPT_bing版 import 连接bing搜索回答问题
function_plugins.update(
{
"连接网络回答问题中文Bing版,输入问题后点击该插件": {
"Group": "对话",
"Color": "stop",
"AsButton": False, # 加入下拉菜单中
"Info": "连接网络回答问题需要访问中文Bing| 输入参数是一个问题",
"Function": HotReload(连接bing搜索回答问题),
}
}
)
except:
print(trimmed_format_exc())
print("Load function plugin failed")
try:
from crazy_functions.解析项目源代码 import 解析任意code项目
from crazy_functions.SourceCode_Analyse import 解析任意code项目
function_plugins.update(
{
@@ -352,8 +474,8 @@ def get_crazy_functions():
}
)
except:
print(trimmed_format_exc())
print("Load function plugin failed")
logger.error(trimmed_format_exc())
logger.error("Load function plugin failed")
try:
from crazy_functions.询问多个大语言模型 import 同时问询_指定模型
@@ -371,53 +493,10 @@ def get_crazy_functions():
}
)
except:
print(trimmed_format_exc())
print("Load function plugin failed")
logger.error(trimmed_format_exc())
logger.error("Load function plugin failed")
try:
from crazy_functions.图片生成 import 图片生成_DALLE2, 图片生成_DALLE3, 图片修改_DALLE2
function_plugins.update(
{
"图片生成_DALLE2 先切换模型到gpt-*": {
"Group": "对话",
"Color": "stop",
"AsButton": False,
"AdvancedArgs": True, # 调用时,唤起高级参数输入区默认False
"ArgsReminder": "在这里输入分辨率, 如1024x1024默认,支持 256x256, 512x512, 1024x1024", # 高级参数输入区的显示提示
"Info": "使用DALLE2生成图片 | 输入参数字符串,提供图像的内容",
"Function": HotReload(图片生成_DALLE2),
},
}
)
function_plugins.update(
{
"图片生成_DALLE3 先切换模型到gpt-*": {
"Group": "对话",
"Color": "stop",
"AsButton": False,
"AdvancedArgs": True, # 调用时,唤起高级参数输入区默认False
"ArgsReminder": "在这里输入自定义参数「分辨率-质量(可选)-风格(可选)」, 参数示例「1024x1024-hd-vivid」 || 分辨率支持 「1024x1024」(默认) /「1792x1024」/「1024x1792」 || 质量支持 「-standard」(默认) /「-hd」 || 风格支持 「-vivid」(默认) /「-natural」", # 高级参数输入区的显示提示
"Info": "使用DALLE3生成图片 | 输入参数字符串,提供图像的内容",
"Function": HotReload(图片生成_DALLE3),
},
}
)
function_plugins.update(
{
"图片修改_DALLE2 先切换模型到gpt-*": {
"Group": "对话",
"Color": "stop",
"AsButton": False,
"AdvancedArgs": False, # 调用时,唤起高级参数输入区默认False
# "Info": "使用DALLE2修改图片 | 输入参数字符串,提供图像的内容",
"Function": HotReload(图片修改_DALLE2),
},
}
)
except:
print(trimmed_format_exc())
print("Load function plugin failed")
try:
from crazy_functions.总结音视频 import 总结音视频
@@ -436,8 +515,8 @@ def get_crazy_functions():
}
)
except:
print(trimmed_format_exc())
print("Load function plugin failed")
logger.error(trimmed_format_exc())
logger.error("Load function plugin failed")
try:
from crazy_functions.数学动画生成manim import 动画生成
@@ -454,11 +533,11 @@ def get_crazy_functions():
}
)
except:
print(trimmed_format_exc())
print("Load function plugin failed")
logger.error(trimmed_format_exc())
logger.error("Load function plugin failed")
try:
from crazy_functions.批量Markdown翻译 import Markdown翻译指定语言
from crazy_functions.Markdown_Translate import Markdown翻译指定语言
function_plugins.update(
{
@@ -473,8 +552,8 @@ def get_crazy_functions():
}
)
except:
print(trimmed_format_exc())
print("Load function plugin failed")
logger.error(trimmed_format_exc())
logger.error("Load function plugin failed")
try:
from crazy_functions.知识库问答 import 知识库文件注入
@@ -492,8 +571,8 @@ def get_crazy_functions():
}
)
except:
print(trimmed_format_exc())
print("Load function plugin failed")
logger.error(trimmed_format_exc())
logger.error("Load function plugin failed")
try:
from crazy_functions.知识库问答 import 读取知识库作答
@@ -511,8 +590,8 @@ def get_crazy_functions():
}
)
except:
print(trimmed_format_exc())
print("Load function plugin failed")
logger.error(trimmed_format_exc())
logger.error("Load function plugin failed")
try:
from crazy_functions.交互功能函数模板 import 交互功能模板函数
@@ -528,50 +607,9 @@ def get_crazy_functions():
}
)
except:
print(trimmed_format_exc())
print("Load function plugin failed")
logger.error(trimmed_format_exc())
logger.error("Load function plugin failed")
try:
from crazy_functions.Latex输出PDF结果 import Latex英文纠错加PDF对比
from crazy_functions.Latex输出PDF结果 import Latex翻译中文并重新编译PDF
function_plugins.update(
{
"Latex英文纠错+高亮修正位置 [需Latex]": {
"Group": "学术",
"Color": "stop",
"AsButton": False,
"AdvancedArgs": True,
"ArgsReminder": "如果有必要, 请在此处追加更细致的矫错指令(使用英文)。",
"Function": HotReload(Latex英文纠错加PDF对比),
},
"Arxiv论文精细翻译输入arxivID[需Latex]": {
"Group": "学术",
"Color": "stop",
"AsButton": False,
"AdvancedArgs": True,
"ArgsReminder": "如果有必要, 请在此处给出自定义翻译命令, 解决部分词汇翻译不准确的问题。 "
+ "例如当单词'agent'翻译不准确时, 请尝试把以下指令复制到高级参数区: "
+ 'If the term "agent" is used in this section, it should be translated to "智能体". ',
"Info": "Arixv论文精细翻译 | 输入参数arxiv论文的ID,比如1812.10695",
"Function": HotReload(Latex翻译中文并重新编译PDF),
},
"本地Latex论文精细翻译上传Latex项目[需Latex]": {
"Group": "学术",
"Color": "stop",
"AsButton": False,
"AdvancedArgs": True,
"ArgsReminder": "如果有必要, 请在此处给出自定义翻译命令, 解决部分词汇翻译不准确的问题。 "
+ "例如当单词'agent'翻译不准确时, 请尝试把以下指令复制到高级参数区: "
+ 'If the term "agent" is used in this section, it should be translated to "智能体". ',
"Info": "本地Latex论文精细翻译 | 输入参数是路径",
"Function": HotReload(Latex翻译中文并重新编译PDF),
}
}
)
except:
print(trimmed_format_exc())
print("Load function plugin failed")
try:
from toolbox import get_conf
@@ -592,8 +630,8 @@ def get_crazy_functions():
}
)
except:
print(trimmed_format_exc())
print("Load function plugin failed")
logger.error(trimmed_format_exc())
logger.error("Load function plugin failed")
try:
from crazy_functions.批量翻译PDF文档_NOUGAT import 批量翻译PDF文档
@@ -609,8 +647,8 @@ def get_crazy_functions():
}
)
except:
print(trimmed_format_exc())
print("Load function plugin failed")
logger.error(trimmed_format_exc())
logger.error("Load function plugin failed")
try:
from crazy_functions.函数动态生成 import 函数动态生成
@@ -626,8 +664,8 @@ def get_crazy_functions():
}
)
except:
print(trimmed_format_exc())
print("Load function plugin failed")
logger.error(trimmed_format_exc())
logger.error("Load function plugin failed")
try:
from crazy_functions.多智能体 import 多智能体终端
@@ -643,8 +681,8 @@ def get_crazy_functions():
}
)
except:
print(trimmed_format_exc())
print("Load function plugin failed")
logger.error(trimmed_format_exc())
logger.error("Load function plugin failed")
try:
from crazy_functions.互动小游戏 import 随机小游戏
@@ -660,8 +698,33 @@ def get_crazy_functions():
}
)
except:
print(trimmed_format_exc())
print("Load function plugin failed")
logger.error(trimmed_format_exc())
logger.error("Load function plugin failed")
try:
from crazy_functions.Rag_Interface import Rag问答
function_plugins.update(
{
"Rag智能召回": {
"Group": "对话",
"Color": "stop",
"AsButton": False,
"Info": "将问答数据记录到向量库中,作为长期参考。",
"Function": HotReload(Rag问答),
},
}
)
except:
logger.error(trimmed_format_exc())
logger.error("Load function plugin failed")
# try:
# from crazy_functions.高级功能函数模板 import 测试图表渲染
@@ -674,7 +737,7 @@ def get_crazy_functions():
# }
# })
# except:
# print(trimmed_format_exc())
# logger.error(trimmed_format_exc())
# print('Load function plugin failed')
# try:

查看文件

@@ -1,232 +0,0 @@
from collections.abc import Callable, Iterable, Mapping
from typing import Any
from toolbox import CatchException, update_ui, gen_time_str, trimmed_format_exc
from toolbox import promote_file_to_downloadzone, get_log_folder
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from .crazy_utils import input_clipping, try_install_deps
from multiprocessing import Process, Pipe
import os
import time
templete = """
```python
import ... # Put dependencies here, e.g. import numpy as np
class TerminalFunction(object): # Do not change the name of the class, The name of the class must be `TerminalFunction`
def run(self, path): # The name of the function must be `run`, it takes only a positional argument.
# rewrite the function you have just written here
...
return generated_file_path
```
"""
def inspect_dependency(chatbot, history):
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return True
def get_code_block(reply):
import re
pattern = r"```([\s\S]*?)```" # regex pattern to match code blocks
matches = re.findall(pattern, reply) # find all code blocks in text
if len(matches) == 1:
return matches[0].strip('python') # code block
for match in matches:
if 'class TerminalFunction' in match:
return match.strip('python') # code block
raise RuntimeError("GPT is not generating proper code.")
def gpt_interact_multi_step(txt, file_type, llm_kwargs, chatbot, history):
# 输入
prompt_compose = [
f'Your job:\n'
f'1. write a single Python function, which takes a path of a `{file_type}` file as the only argument and returns a `string` containing the result of analysis or the path of generated files. \n',
f"2. You should write this function to perform following task: " + txt + "\n",
f"3. Wrap the output python function with markdown codeblock."
]
i_say = "".join(prompt_compose)
demo = []
# 第一步
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
inputs=i_say, inputs_show_user=i_say,
llm_kwargs=llm_kwargs, chatbot=chatbot, history=demo,
sys_prompt= r"You are a programmer."
)
history.extend([i_say, gpt_say])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 界面更新
# 第二步
prompt_compose = [
"If previous stage is successful, rewrite the function you have just written to satisfy following templete: \n",
templete
]
i_say = "".join(prompt_compose); inputs_show_user = "If previous stage is successful, rewrite the function you have just written to satisfy executable templete. "
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
inputs=i_say, inputs_show_user=inputs_show_user,
llm_kwargs=llm_kwargs, chatbot=chatbot, history=history,
sys_prompt= r"You are a programmer."
)
code_to_return = gpt_say
history.extend([i_say, gpt_say])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 界面更新
# # 第三步
# i_say = "Please list to packages to install to run the code above. Then show me how to use `try_install_deps` function to install them."
# i_say += 'For instance. `try_install_deps(["opencv-python", "scipy", "numpy"])`'
# installation_advance = yield from request_gpt_model_in_new_thread_with_ui_alive(
# inputs=i_say, inputs_show_user=inputs_show_user,
# llm_kwargs=llm_kwargs, chatbot=chatbot, history=history,
# sys_prompt= r"You are a programmer."
# )
# # # 第三步
# i_say = "Show me how to use `pip` to install packages to run the code above. "
# i_say += 'For instance. `pip install -r opencv-python scipy numpy`'
# installation_advance = yield from request_gpt_model_in_new_thread_with_ui_alive(
# inputs=i_say, inputs_show_user=i_say,
# llm_kwargs=llm_kwargs, chatbot=chatbot, history=history,
# sys_prompt= r"You are a programmer."
# )
installation_advance = ""
return code_to_return, installation_advance, txt, file_type, llm_kwargs, chatbot, history
def make_module(code):
module_file = 'gpt_fn_' + gen_time_str().replace('-','_')
with open(f'{get_log_folder()}/{module_file}.py', 'w', encoding='utf8') as f:
f.write(code)
def get_class_name(class_string):
import re
# Use regex to extract the class name
class_name = re.search(r'class (\w+)\(', class_string).group(1)
return class_name
class_name = get_class_name(code)
return f"{get_log_folder().replace('/', '.')}.{module_file}->{class_name}"
def init_module_instance(module):
import importlib
module_, class_ = module.split('->')
init_f = getattr(importlib.import_module(module_), class_)
return init_f()
def for_immediate_show_off_when_possible(file_type, fp, chatbot):
if file_type in ['png', 'jpg']:
image_path = os.path.abspath(fp)
chatbot.append(['这是一张图片, 展示如下:',
f'本地文件地址: <br/>`{image_path}`<br/>'+
f'本地文件预览: <br/><div align="center"><img src="file={image_path}"></div>'
])
return chatbot
def subprocess_worker(instance, file_path, return_dict):
return_dict['result'] = instance.run(file_path)
def have_any_recent_upload_files(chatbot):
_5min = 5 * 60
if not chatbot: return False # chatbot is None
most_recent_uploaded = chatbot._cookies.get("most_recent_uploaded", None)
if not most_recent_uploaded: return False # most_recent_uploaded is None
if time.time() - most_recent_uploaded["time"] < _5min: return True # most_recent_uploaded is new
else: return False # most_recent_uploaded is too old
def get_recent_file_prompt_support(chatbot):
most_recent_uploaded = chatbot._cookies.get("most_recent_uploaded", None)
path = most_recent_uploaded['path']
return path
@CatchException
def 虚空终端CodeInterpreter(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
"""
txt 输入栏用户输入的文本,例如需要翻译的一段话,再例如一个包含了待处理文件的路径
llm_kwargs gpt模型参数,如温度和top_p等,一般原样传递下去就行
plugin_kwargs 插件模型的参数,暂时没有用武之地
chatbot 聊天显示框的句柄,用于显示给用户
history 聊天历史,前情提要
system_prompt 给gpt的静默提醒
user_request 当前用户的请求信息IP地址等
"""
raise NotImplementedError
# 清空历史,以免输入溢出
history = []; clear_file_downloadzone(chatbot)
# 基本信息:功能、贡献者
chatbot.append([
"函数插件功能?",
"CodeInterpreter开源版, 此插件处于开发阶段, 建议暂时不要使用, 插件初始化中 ..."
])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
if have_any_recent_upload_files(chatbot):
file_path = get_recent_file_prompt_support(chatbot)
else:
chatbot.append(["文件检索", "没有发现任何近期上传的文件。"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# 读取文件
if ("recently_uploaded_files" in plugin_kwargs) and (plugin_kwargs["recently_uploaded_files"] == ""): plugin_kwargs.pop("recently_uploaded_files")
recently_uploaded_files = plugin_kwargs.get("recently_uploaded_files", None)
file_path = recently_uploaded_files[-1]
file_type = file_path.split('.')[-1]
# 粗心检查
if is_the_upload_folder(txt):
chatbot.append([
"...",
f"请在输入框内填写需求,然后再次点击该插件(文件路径 {file_path} 已经被记忆)"
])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# 开始干正事
for j in range(5): # 最多重试5次
try:
code, installation_advance, txt, file_type, llm_kwargs, chatbot, history = \
yield from gpt_interact_multi_step(txt, file_type, llm_kwargs, chatbot, history)
code = get_code_block(code)
res = make_module(code)
instance = init_module_instance(res)
break
except Exception as e:
chatbot.append([f"{j}次代码生成尝试,失败了", f"错误追踪\n```\n{trimmed_format_exc()}\n```\n"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# 代码生成结束, 开始执行
try:
import multiprocessing
manager = multiprocessing.Manager()
return_dict = manager.dict()
p = multiprocessing.Process(target=subprocess_worker, args=(instance, file_path, return_dict))
# only has 10 seconds to run
p.start(); p.join(timeout=10)
if p.is_alive(): p.terminate(); p.join()
p.close()
res = return_dict['result']
# res = instance.run(file_path)
except Exception as e:
chatbot.append(["执行失败了", f"错误追踪\n```\n{trimmed_format_exc()}\n```\n"])
# chatbot.append(["如果是缺乏依赖,请参考以下建议", installation_advance])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# 顺利完成,收尾
res = str(res)
if os.path.exists(res):
chatbot.append(["执行成功了,结果是一个有效文件", "结果:" + res])
new_file_path = promote_file_to_downloadzone(res, chatbot=chatbot)
chatbot = for_immediate_show_off_when_possible(file_type, new_file_path, chatbot)
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 界面更新
else:
chatbot.append(["执行成功了,结果是一个字符串", "结果:" + res])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 界面更新
"""
测试:
裁剪图像,保留下半部分
交换图像的蓝色通道和红色通道
将图像转为灰度图像
将csv文件转excel表格
"""

查看文件

@@ -1,4 +1,5 @@
from toolbox import CatchException, update_ui, promote_file_to_downloadzone, get_log_folder, get_user
from crazy_functions.plugin_template.plugin_class_template import GptAcademicPluginTemplate, ArgProperty
import re
f_prefix = 'GPT-Academic对话存档'
@@ -9,27 +10,61 @@ def write_chat_to_file(chatbot, history=None, file_name=None):
"""
import os
import time
from themes.theme import advanced_css
if file_name is None:
file_name = f_prefix + time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime()) + '.html'
fp = os.path.join(get_log_folder(get_user(chatbot), plugin_name='chat_history'), file_name)
with open(fp, 'w', encoding='utf8') as f:
from themes.theme import advanced_css
f.write(f'<!DOCTYPE html><head><meta charset="utf-8"><title>对话历史</title><style>{advanced_css}</style></head>')
from textwrap import dedent
form = dedent("""
<!DOCTYPE html><head><meta charset="utf-8"><title>对话存档</title><style>{CSS}</style></head>
<body>
<div class="test_temp1" style="width:10%; height: 500px; float:left;"></div>
<div class="test_temp2" style="width:80%;padding: 40px;float:left;padding-left: 20px;padding-right: 20px;box-shadow: rgba(0, 0, 0, 0.2) 0px 0px 8px 8px;border-radius: 10px;">
<div class="chat-body" style="display: flex;justify-content: center;flex-direction: column;align-items: center;flex-wrap: nowrap;">
{CHAT_PREVIEW}
<div></div>
<div></div>
<div style="text-align: center;width:80%;padding: 0px;float:left;padding-left:20px;padding-right:20px;box-shadow: rgba(0, 0, 0, 0.05) 0px 0px 1px 2px;border-radius: 1px;">对话原始数据</div>
{HISTORY_PREVIEW}
</div>
</div>
<div class="test_temp3" style="width:10%; height: 500px; float:left;"></div>
</body>
""")
qa_from = dedent("""
<div class="QaBox" style="width:80%;padding: 20px;margin-bottom: 20px;box-shadow: rgb(0 255 159 / 50%) 0px 0px 1px 2px;border-radius: 4px;">
<div class="Question" style="border-radius: 2px;">{QUESTION}</div>
<hr color="blue" style="border-top: dotted 2px #ccc;">
<div class="Answer" style="border-radius: 2px;">{ANSWER}</div>
</div>
""")
history_from = dedent("""
<div class="historyBox" style="width:80%;padding: 0px;float:left;padding-left:20px;padding-right:20px;box-shadow: rgba(0, 0, 0, 0.05) 0px 0px 1px 2px;border-radius: 1px;">
<div class="entry" style="border-radius: 2px;">{ENTRY}</div>
</div>
""")
CHAT_PREVIEW_BUF = ""
for i, contents in enumerate(chatbot):
for j, content in enumerate(contents):
try: # 这个bug没找到触发条件,暂时先这样顶一下
if type(content) != str: content = str(content)
except:
continue
f.write(content)
if j == 0:
f.write('<hr style="border-top: dotted 3px #ccc;">')
f.write('<hr color="red"> \n\n')
f.write('<hr color="blue"> \n\n raw chat context:\n')
f.write('<code>')
question, answer = contents[0], contents[1]
if question is None: question = ""
try: question = str(question)
except: question = ""
if answer is None: answer = ""
try: answer = str(answer)
except: answer = ""
CHAT_PREVIEW_BUF += qa_from.format(QUESTION=question, ANSWER=answer)
HISTORY_PREVIEW_BUF = ""
for h in history:
f.write("\n>>>" + h)
f.write('</code>')
HISTORY_PREVIEW_BUF += history_from.format(ENTRY=h)
html_content = form.format(CHAT_PREVIEW=CHAT_PREVIEW_BUF, HISTORY_PREVIEW=HISTORY_PREVIEW_BUF, CSS=advanced_css)
f.write(html_content)
promote_file_to_downloadzone(fp, rename_file=file_name, chatbot=chatbot)
return '对话历史写入:' + fp
@@ -40,7 +75,7 @@ def gen_file_preview(file_name):
# pattern to match the text between <head> and </head>
pattern = re.compile(r'<head>.*?</head>', flags=re.DOTALL)
file_content = re.sub(pattern, '', file_content)
html, history = file_content.split('<hr color="blue"> \n\n raw chat context:\n')
html, history = file_content.split('<hr color="blue"> \n\n 对话数据 (无渲染):\n')
history = history.strip('<code>')
history = history.strip('</code>')
history = history.split("\n>>>")
@@ -51,21 +86,25 @@ def gen_file_preview(file_name):
def read_file_to_chat(chatbot, history, file_name):
with open(file_name, 'r', encoding='utf8') as f:
file_content = f.read()
# pattern to match the text between <head> and </head>
pattern = re.compile(r'<head>.*?</head>', flags=re.DOTALL)
file_content = re.sub(pattern, '', file_content)
html, history = file_content.split('<hr color="blue"> \n\n raw chat context:\n')
history = history.strip('<code>')
history = history.strip('</code>')
history = history.split("\n>>>")
history = list(filter(lambda x:x!="", history))
html = html.split('<hr color="red"> \n\n')
html = list(filter(lambda x:x!="", html))
from bs4 import BeautifulSoup
soup = BeautifulSoup(file_content, 'lxml')
# 提取QaBox信息
chatbot.clear()
for i, h in enumerate(html):
i_say, gpt_say = h.split('<hr style="border-top: dotted 3px #ccc;">')
chatbot.append([i_say, gpt_say])
chatbot.append([f"存档文件详情?", f"[Local Message] 载入对话{len(html)}条,上下文{len(history)}条。"])
qa_box_list = []
qa_boxes = soup.find_all("div", class_="QaBox")
for box in qa_boxes:
question = box.find("div", class_="Question").get_text(strip=False)
answer = box.find("div", class_="Answer").get_text(strip=False)
qa_box_list.append({"Question": question, "Answer": answer})
chatbot.append([question, answer])
# 提取historyBox信息
history_box_list = []
history_boxes = soup.find_all("div", class_="historyBox")
for box in history_boxes:
entry = box.find("div", class_="entry").get_text(strip=False)
history_box_list.append(entry)
history = history_box_list
chatbot.append([None, f"[Local Message] 载入对话{len(qa_box_list)}条,上下文{len(history)}条。"])
return chatbot, history
@CatchException
@@ -79,11 +118,42 @@ def 对话历史存档(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_
system_prompt 给gpt的静默提醒
user_request 当前用户的请求信息IP地址等
"""
file_name = plugin_kwargs.get("file_name", None)
if (file_name is not None) and (file_name != "") and (not file_name.endswith('.html')): file_name += '.html'
else: file_name = None
chatbot.append(("保存当前对话",
f"[Local Message] {write_chat_to_file(chatbot, history)},您可以调用下拉菜单中的“载入对话历史存档”还原当下的对话。"))
chatbot.append((None, f"[Local Message] {write_chat_to_file(chatbot, history, file_name)},您可以调用下拉菜单中的“载入对话历史存档”还原当下的对话"))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 由于请求gpt需要一段时间,我们先及时地做一次界面更新
class Conversation_To_File_Wrap(GptAcademicPluginTemplate):
def __init__(self):
"""
请注意`execute`会执行在不同的线程中因此您在定义和使用类变量时应当慎之又慎
"""
pass
def define_arg_selection_menu(self):
"""
定义插件的二级选项菜单
第一个参数名称`file_name`参数`type`声明这是一个文本框文本框上方显示`title`文本框内部显示`description``default_value`为默认值
"""
gui_definition = {
"file_name": ArgProperty(title="保存文件名", description="输入对话存档文件名,留空则使用时间作为文件名", default_value="", type="string").model_dump_json(), # 主输入,自动从输入框同步
}
return gui_definition
def execute(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
"""
执行插件
"""
yield from 对话历史存档(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request)
def hide_cwd(str):
import os
current_path = os.getcwd()
@@ -101,7 +171,7 @@ def 载入对话历史存档(txt, llm_kwargs, plugin_kwargs, chatbot, history, s
system_prompt 给gpt的静默提醒
user_request 当前用户的请求信息IP地址等
"""
from .crazy_utils import get_files_from_everything
from crazy_functions.crazy_utils import get_files_from_everything
success, file_manifest, _ = get_files_from_everything(txt, type='.html')
if not success:
@@ -148,5 +218,3 @@ def 删除所有本地对话历史记录(txt, llm_kwargs, plugin_kwargs, chatbot
chatbot.append([f"删除所有历史对话文件", f"已删除<br/>{local_history}"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return

查看文件

@@ -30,7 +30,7 @@ def gen_image(llm_kwargs, prompt, resolution="1024x1024", model="dall-e-2", qual
if style is not None:
data['style'] = style
response = requests.post(url, headers=headers, json=data, proxies=proxies)
print(response.content)
# logger.info(response.content)
try:
image_url = json.loads(response.content.decode('utf8'))['data'][0]['url']
except:
@@ -76,7 +76,7 @@ def edit_image(llm_kwargs, prompt, image_path, resolution="1024x1024", model="da
}
response = requests.post(url, headers=headers, files=files, proxies=proxies)
print(response.content)
# logger.info(response.content)
try:
image_url = json.loads(response.content.decode('utf8'))['data'][0]['url']
except:
@@ -108,7 +108,7 @@ def 图片生成_DALLE2(prompt, llm_kwargs, plugin_kwargs, chatbot, history, sys
chatbot.append((prompt, "[Local Message] 图像生成提示为空白,请在“输入区”输入图像生成提示。"))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 界面更新
return
chatbot.append(("您正在调用“图像生成”插件。", "[Local Message] 生成图像, 请先把模型切换至gpt-*。如果中文Prompt效果不理想, 请尝试英文Prompt。正在处理中 ....."))
chatbot.append(("您正在调用“图像生成”插件。", "[Local Message] 生成图像, 使用前请切换模型到GPT系列。如果中文Prompt效果不理想, 请尝试英文Prompt。正在处理中 ....."))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 由于请求gpt需要一段时间,我们先及时地做一次界面更新
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
resolution = plugin_kwargs.get("advanced_arg", '1024x1024')
@@ -129,7 +129,7 @@ def 图片生成_DALLE3(prompt, llm_kwargs, plugin_kwargs, chatbot, history, sys
chatbot.append((prompt, "[Local Message] 图像生成提示为空白,请在“输入区”输入图像生成提示。"))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 界面更新
return
chatbot.append(("您正在调用“图像生成”插件。", "[Local Message] 生成图像, 请先把模型切换至gpt-*。如果中文Prompt效果不理想, 请尝试英文Prompt。正在处理中 ....."))
chatbot.append(("您正在调用“图像生成”插件。", "[Local Message] 生成图像, 使用前请切换模型到GPT系列。如果中文Prompt效果不理想, 请尝试英文Prompt。正在处理中 ....."))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 由于请求gpt需要一段时间,我们先及时地做一次界面更新
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
resolution_arg = plugin_kwargs.get("advanced_arg", '1024x1024-standard-vivid').lower()
@@ -166,7 +166,7 @@ class ImageEditState(GptAcademicState):
return confirm, file
def lock_plugin(self, chatbot):
chatbot._cookies['lock_plugin'] = 'crazy_functions.图片生成->图片修改_DALLE2'
chatbot._cookies['lock_plugin'] = 'crazy_functions.Image_Generate->图片修改_DALLE2'
self.dump_state(chatbot)
def unlock_plugin(self, chatbot):

查看文件

@@ -0,0 +1,56 @@
from toolbox import get_conf, update_ui
from crazy_functions.Image_Generate import 图片生成_DALLE2, 图片生成_DALLE3, 图片修改_DALLE2
from crazy_functions.plugin_template.plugin_class_template import GptAcademicPluginTemplate, ArgProperty
class ImageGen_Wrap(GptAcademicPluginTemplate):
def __init__(self):
"""
请注意`execute`会执行在不同的线程中,因此您在定义和使用类变量时,应当慎之又慎!
"""
pass
def define_arg_selection_menu(self):
"""
定义插件的二级选项菜单
第一个参数,名称`main_input`,参数`type`声明这是一个文本框,文本框上方显示`title`,文本框内部显示`description`,`default_value`为默认值;
第二个参数,名称`advanced_arg`,参数`type`声明这是一个文本框,文本框上方显示`title`,文本框内部显示`description`,`default_value`为默认值;
"""
gui_definition = {
"main_input":
ArgProperty(title="输入图片描述", description="需要生成图像的文本描述,尽量使用英文", default_value="", type="string").model_dump_json(), # 主输入,自动从输入框同步
"model_name":
ArgProperty(title="模型", options=["DALLE2", "DALLE3"], default_value="DALLE3", description="", type="dropdown").model_dump_json(),
"resolution":
ArgProperty(title="分辨率", options=["256x256(限DALLE2)", "512x512(限DALLE2)", "1024x1024", "1792x1024(限DALLE3)", "1024x1792(限DALLE3)"], default_value="1024x1024", description="", type="dropdown").model_dump_json(),
"quality (仅DALLE3生效)":
ArgProperty(title="质量", options=["standard", "hd"], default_value="standard", description="", type="dropdown").model_dump_json(),
"style (仅DALLE3生效)":
ArgProperty(title="风格", options=["vivid", "natural"], default_value="vivid", description="", type="dropdown").model_dump_json(),
}
return gui_definition
def execute(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
"""
执行插件
"""
# 分辨率
resolution = plugin_kwargs["resolution"].replace("(限DALLE2)", "").replace("(限DALLE3)", "")
if plugin_kwargs["model_name"] == "DALLE2":
plugin_kwargs["advanced_arg"] = resolution
yield from 图片生成_DALLE2(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request)
elif plugin_kwargs["model_name"] == "DALLE3":
quality = plugin_kwargs["quality (仅DALLE3生效)"]
style = plugin_kwargs["style (仅DALLE3生效)"]
plugin_kwargs["advanced_arg"] = f"{resolution}-{quality}-{style}"
yield from 图片生成_DALLE3(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request)
else:
chatbot.append([None, "抱歉,找不到该模型"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面

查看文件

@@ -0,0 +1,278 @@
import requests
import random
import time
import re
import json
from bs4 import BeautifulSoup
from functools import lru_cache
from itertools import zip_longest
from check_proxy import check_proxy
from toolbox import CatchException, update_ui, get_conf
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive, input_clipping
from request_llms.bridge_all import model_info
from request_llms.bridge_all import predict_no_ui_long_connection
from crazy_functions.prompts.internet import SearchOptimizerPrompt, SearchAcademicOptimizerPrompt
def search_optimizer(
query,
proxies,
history,
llm_kwargs,
optimizer=1,
categories="general",
searxng_url=None,
engines=None,
):
# ------------- < 第1步尝试进行搜索优化 > -------------
# * 增强优化,会尝试结合历史记录进行搜索优化
if optimizer == 2:
his = " "
if len(history) == 0:
pass
else:
for i, h in enumerate(history):
if i % 2 == 0:
his += f"Q: {h}\n"
else:
his += f"A: {h}\n"
if categories == "general":
sys_prompt = SearchOptimizerPrompt.format(query=query, history=his, num=4)
elif categories == "science":
sys_prompt = SearchAcademicOptimizerPrompt.format(query=query, history=his, num=4)
else:
his = " "
if categories == "general":
sys_prompt = SearchOptimizerPrompt.format(query=query, history=his, num=3)
elif categories == "science":
sys_prompt = SearchAcademicOptimizerPrompt.format(query=query, history=his, num=3)
mutable = ["", time.time(), ""]
llm_kwargs["temperature"] = 0.8
try:
querys_json = predict_no_ui_long_connection(
inputs=query,
llm_kwargs=llm_kwargs,
history=[],
sys_prompt=sys_prompt,
observe_window=mutable,
)
except Exception:
querys_json = "1234"
#* 尝试解码优化后的搜索结果
querys_json = re.sub(r"```json|```", "", querys_json)
try:
querys = json.loads(querys_json)
except Exception:
#* 如果解码失败,降低温度再试一次
try:
llm_kwargs["temperature"] = 0.4
querys_json = predict_no_ui_long_connection(
inputs=query,
llm_kwargs=llm_kwargs,
history=[],
sys_prompt=sys_prompt,
observe_window=mutable,
)
querys_json = re.sub(r"```json|```", "", querys_json)
querys = json.loads(querys_json)
except Exception:
#* 如果再次失败,直接返回原始问题
querys = [query]
links = []
success = 0
Exceptions = ""
for q in querys:
try:
link = searxng_request(q, proxies, categories, searxng_url, engines=engines)
if len(link) > 0:
links.append(link[:-5])
success += 1
except Exception:
Exceptions = Exception
pass
if success == 0:
raise ValueError(f"在线搜索失败!\n{Exceptions}")
# * 清洗搜索结果,依次放入每组第一,第二个搜索结果,并清洗重复的搜索结果
seen_links = set()
result = []
for tuple in zip_longest(*links, fillvalue=None):
for item in tuple:
if item is not None:
link = item["link"]
if link not in seen_links:
seen_links.add(link)
result.append(item)
return result
@lru_cache
def get_auth_ip():
ip = check_proxy(None, return_ip=True)
if ip is None:
return '114.114.114.' + str(random.randint(1, 10))
return ip
def searxng_request(query, proxies, categories='general', searxng_url=None, engines=None):
if searxng_url is None:
url = get_conf("SEARXNG_URL")
else:
url = searxng_url
if engines == "Mixed":
engines = None
if categories == 'general':
params = {
'q': query, # 搜索查询
'format': 'json', # 输出格式为JSON
'language': 'zh', # 搜索语言
'engines': engines,
}
elif categories == 'science':
params = {
'q': query, # 搜索查询
'format': 'json', # 输出格式为JSON
'language': 'zh', # 搜索语言
'categories': 'science'
}
else:
raise ValueError('不支持的检索类型')
headers = {
'Accept-Language': 'zh-CN,zh;q=0.9',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36',
'X-Forwarded-For': get_auth_ip(),
'X-Real-IP': get_auth_ip()
}
results = []
response = requests.post(url, params=params, headers=headers, proxies=proxies, timeout=30)
if response.status_code == 200:
json_result = response.json()
for result in json_result['results']:
item = {
"title": result.get("title", ""),
"source": result.get("engines", "unknown"),
"content": result.get("content", ""),
"link": result["url"],
}
results.append(item)
return results
else:
if response.status_code == 429:
raise ValueError("Searxng在线搜索服务当前使用人数太多,请稍后。")
else:
raise ValueError("在线搜索失败,状态码: " + str(response.status_code) + '\t' + response.content.decode('utf-8'))
def scrape_text(url, proxies) -> str:
"""Scrape text from a webpage
Args:
url (str): The URL to scrape text from
Returns:
str: The scraped text
"""
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36',
'Content-Type': 'text/plain',
}
try:
response = requests.get(url, headers=headers, proxies=proxies, timeout=8)
if response.encoding == "ISO-8859-1": response.encoding = response.apparent_encoding
except:
return "无法连接到该网页"
soup = BeautifulSoup(response.text, "html.parser")
for script in soup(["script", "style"]):
script.extract()
text = soup.get_text()
lines = (line.strip() for line in text.splitlines())
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
text = "\n".join(chunk for chunk in chunks if chunk)
return text
@CatchException
def 连接网络回答问题(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
optimizer_history = history[:-8]
history = [] # 清空历史,以免输入溢出
chatbot.append((f"请结合互联网信息回答以下问题:{txt}", "检索中..."))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# ------------- < 第1步爬取搜索引擎的结果 > -------------
from toolbox import get_conf
proxies = get_conf('proxies')
categories = plugin_kwargs.get('categories', 'general')
searxng_url = plugin_kwargs.get('searxng_url', None)
engines = plugin_kwargs.get('engine', None)
optimizer = plugin_kwargs.get('optimizer', "关闭")
if optimizer == "关闭":
urls = searxng_request(txt, proxies, categories, searxng_url, engines=engines)
else:
urls = search_optimizer(txt, proxies, optimizer_history, llm_kwargs, optimizer, categories, searxng_url, engines)
history = []
if len(urls) == 0:
chatbot.append((f"结论:{txt}",
"[Local Message] 受到限制,无法从searxng获取信息请尝试更换搜索引擎。"))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# ------------- < 第2步依次访问网页 > -------------
max_search_result = 5 # 最多收纳多少个网页的结果
if optimizer == "开启(增强)":
max_search_result = 8
chatbot.append(["联网检索中 ...", None])
for index, url in enumerate(urls[:max_search_result]):
res = scrape_text(url['link'], proxies)
prefix = f"{index}份搜索结果 [源自{url['source'][0]}搜索] {url['title'][:25]}"
history.extend([prefix, res])
res_squeeze = res.replace('\n', '...')
chatbot[-1] = [prefix + "\n\n" + res_squeeze[:500] + "......", None]
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# ------------- < 第3步ChatGPT综合 > -------------
if (optimizer != "开启(增强)"):
i_say = f"从以上搜索结果中抽取信息,然后回答问题:{txt}"
i_say, history = input_clipping( # 裁剪输入,从最长的条目开始裁剪,防止爆token
inputs=i_say,
history=history,
max_token_limit=min(model_info[llm_kwargs['llm_model']]['max_token']*3//4, 8192)
)
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
inputs=i_say, inputs_show_user=i_say,
llm_kwargs=llm_kwargs, chatbot=chatbot, history=history,
sys_prompt="请从给定的若干条搜索结果中抽取信息,对最相关的两个搜索结果进行总结,然后回答问题。"
)
chatbot[-1] = (i_say, gpt_say)
history.append(i_say);history.append(gpt_say)
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 界面更新
#* 或者使用搜索优化器,这样可以保证后续问答能读取到有效的历史记录
else:
i_say = f"从以上搜索结果中抽取与问题:{txt} 相关的信息:"
i_say, history = input_clipping( # 裁剪输入,从最长的条目开始裁剪,防止爆token
inputs=i_say,
history=history,
max_token_limit=min(model_info[llm_kwargs['llm_model']]['max_token']*3//4, 8192)
)
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
inputs=i_say, inputs_show_user=i_say,
llm_kwargs=llm_kwargs, chatbot=chatbot, history=history,
sys_prompt="请从给定的若干条搜索结果中抽取信息,对最相关的三个搜索结果进行总结"
)
chatbot[-1] = (i_say, gpt_say)
history = []
history.append(i_say);history.append(gpt_say)
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 界面更新
# ------------- < 第4步根据综合回答问题 > -------------
i_say = f"请根据以上搜索结果回答问题:{txt}"
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
inputs=i_say, inputs_show_user=i_say,
llm_kwargs=llm_kwargs, chatbot=chatbot, history=history,
sys_prompt="请根据给定的若干条搜索结果回答问题"
)
chatbot[-1] = (i_say, gpt_say)
history.append(i_say);history.append(gpt_say)
yield from update_ui(chatbot=chatbot, history=history)

查看文件

@@ -0,0 +1,45 @@
from toolbox import get_conf
from crazy_functions.Internet_GPT import 连接网络回答问题
from crazy_functions.plugin_template.plugin_class_template import GptAcademicPluginTemplate, ArgProperty
class NetworkGPT_Wrap(GptAcademicPluginTemplate):
def __init__(self):
"""
请注意`execute`会执行在不同的线程中,因此您在定义和使用类变量时,应当慎之又慎!
"""
pass
def define_arg_selection_menu(self):
"""
定义插件的二级选项菜单
第一个参数,名称`main_input`,参数`type`声明这是一个文本框,文本框上方显示`title`,文本框内部显示`description`,`default_value`为默认值;
第二个参数,名称`advanced_arg`,参数`type`声明这是一个文本框,文本框上方显示`title`,文本框内部显示`description`,`default_value`为默认值;
第三个参数,名称`allow_cache`,参数`type`声明这是一个下拉菜单,下拉菜单上方显示`title`+`description`,下拉菜单的选项为`options`,`default_value`为下拉菜单默认值;
"""
gui_definition = {
"main_input":
ArgProperty(title="输入问题", description="待通过互联网检索的问题,会自动读取输入框内容", default_value="", type="string").model_dump_json(), # 主输入,自动从输入框同步
"categories":
ArgProperty(title="搜索分类", options=["网页", "学术论文"], default_value="网页", description="", type="dropdown").model_dump_json(),
"engine":
ArgProperty(title="选择搜索引擎", options=["Mixed", "bing", "google", "duckduckgo"], default_value="google", description="", type="dropdown").model_dump_json(),
"optimizer":
ArgProperty(title="搜索优化", options=["关闭", "开启", "开启(增强)"], default_value="关闭", description="是否使用搜索增强。注意这可能会消耗较多token", type="dropdown").model_dump_json(),
"searxng_url":
ArgProperty(title="Searxng服务地址", description="输入Searxng的地址", default_value=get_conf("SEARXNG_URL"), type="string").model_dump_json(), # 主输入,自动从输入框同步
}
return gui_definition
def execute(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
"""
执行插件
"""
if plugin_kwargs["categories"] == "网页": plugin_kwargs["categories"] = "general"
if plugin_kwargs["categories"] == "学术论文": plugin_kwargs["categories"] = "science"
yield from 连接网络回答问题(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request)

查看文件

@@ -0,0 +1,595 @@
from toolbox import update_ui, trimmed_format_exc, get_conf, get_log_folder, promote_file_to_downloadzone, check_repeat_upload, map_file_to_sha256
from toolbox import CatchException, report_exception, update_ui_lastest_msg, zip_result, gen_time_str
from functools import partial
from loguru import logger
import glob, os, requests, time, json, tarfile, threading
pj = os.path.join
ARXIV_CACHE_DIR = get_conf("ARXIV_CACHE_DIR")
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- 工具函数 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
# 专业词汇声明 = 'If the term "agent" is used in this section, it should be translated to "智能体". '
def switch_prompt(pfg, mode, more_requirement):
"""
Generate prompts and system prompts based on the mode for proofreading or translating.
Args:
- pfg: Proofreader or Translator instance.
- mode: A string specifying the mode, either 'proofread' or 'translate_zh'.
Returns:
- inputs_array: A list of strings containing prompts for users to respond to.
- sys_prompt_array: A list of strings containing prompts for system prompts.
"""
n_split = len(pfg.sp_file_contents)
if mode == 'proofread_en':
inputs_array = [r"Below is a section from an academic paper, proofread this section." +
r"Do not modify any latex command such as \section, \cite, \begin, \item and equations. " + more_requirement +
r"Answer me only with the revised text:" +
f"\n\n{frag}" for frag in pfg.sp_file_contents]
sys_prompt_array = ["You are a professional academic paper writer." for _ in range(n_split)]
elif mode == 'translate_zh':
inputs_array = [
r"Below is a section from an English academic paper, translate it into Chinese. " + more_requirement +
r"Do not modify any latex command such as \section, \cite, \begin, \item and equations. " +
r"Answer me only with the translated text:" +
f"\n\n{frag}" for frag in pfg.sp_file_contents]
sys_prompt_array = ["You are a professional translator." for _ in range(n_split)]
else:
assert False, "未知指令"
return inputs_array, sys_prompt_array
def desend_to_extracted_folder_if_exist(project_folder):
"""
Descend into the extracted folder if it exists, otherwise return the original folder.
Args:
- project_folder: A string specifying the folder path.
Returns:
- A string specifying the path to the extracted folder, or the original folder if there is no extracted folder.
"""
maybe_dir = [f for f in glob.glob(f'{project_folder}/*') if os.path.isdir(f)]
if len(maybe_dir) == 0: return project_folder
if maybe_dir[0].endswith('.extract'): return maybe_dir[0]
return project_folder
def move_project(project_folder, arxiv_id=None):
"""
Create a new work folder and copy the project folder to it.
Args:
- project_folder: A string specifying the folder path of the project.
Returns:
- A string specifying the path to the new work folder.
"""
import shutil, time
time.sleep(2) # avoid time string conflict
if arxiv_id is not None:
new_workfolder = pj(ARXIV_CACHE_DIR, arxiv_id, 'workfolder')
else:
new_workfolder = f'{get_log_folder()}/{gen_time_str()}'
try:
shutil.rmtree(new_workfolder)
except:
pass
# align subfolder if there is a folder wrapper
items = glob.glob(pj(project_folder, '*'))
items = [item for item in items if os.path.basename(item) != '__MACOSX']
if len(glob.glob(pj(project_folder, '*.tex'))) == 0 and len(items) == 1:
if os.path.isdir(items[0]): project_folder = items[0]
shutil.copytree(src=project_folder, dst=new_workfolder)
return new_workfolder
def arxiv_download(chatbot, history, txt, allow_cache=True):
def check_cached_translation_pdf(arxiv_id):
translation_dir = pj(ARXIV_CACHE_DIR, arxiv_id, 'translation')
if not os.path.exists(translation_dir):
os.makedirs(translation_dir)
target_file = pj(translation_dir, 'translate_zh.pdf')
if os.path.exists(target_file):
promote_file_to_downloadzone(target_file, rename_file=None, chatbot=chatbot)
target_file_compare = pj(translation_dir, 'comparison.pdf')
if os.path.exists(target_file_compare):
promote_file_to_downloadzone(target_file_compare, rename_file=None, chatbot=chatbot)
return target_file
return False
def is_float(s):
try:
float(s)
return True
except ValueError:
return False
if txt.startswith('https://arxiv.org/pdf/'):
arxiv_id = txt.split('/')[-1] # 2402.14207v2.pdf
txt = arxiv_id.split('v')[0] # 2402.14207
if ('.' in txt) and ('/' not in txt) and is_float(txt): # is arxiv ID
txt = 'https://arxiv.org/abs/' + txt.strip()
if ('.' in txt) and ('/' not in txt) and is_float(txt[:10]): # is arxiv ID
txt = 'https://arxiv.org/abs/' + txt[:10]
if not txt.startswith('https://arxiv.org'):
return txt, None # 是本地文件,跳过下载
# <-------------- inspect format ------------->
chatbot.append([f"检测到arxiv文档连接", '尝试下载 ...'])
yield from update_ui(chatbot=chatbot, history=history)
time.sleep(1) # 刷新界面
url_ = txt # https://arxiv.org/abs/1707.06690
if not txt.startswith('https://arxiv.org/abs/'):
msg = f"解析arxiv网址失败, 期望格式例如: https://arxiv.org/abs/1707.06690。实际得到格式: {url_}"
yield from update_ui_lastest_msg(msg, chatbot=chatbot, history=history) # 刷新界面
return msg, None
# <-------------- set format ------------->
arxiv_id = url_.split('/abs/')[-1]
if 'v' in arxiv_id: arxiv_id = arxiv_id[:10]
cached_translation_pdf = check_cached_translation_pdf(arxiv_id)
if cached_translation_pdf and allow_cache: return cached_translation_pdf, arxiv_id
extract_dst = pj(ARXIV_CACHE_DIR, arxiv_id, 'extract')
translation_dir = pj(ARXIV_CACHE_DIR, arxiv_id, 'e-print')
dst = pj(translation_dir, arxiv_id + '.tar')
os.makedirs(translation_dir, exist_ok=True)
# <-------------- download arxiv source file ------------->
def fix_url_and_download():
# for url_tar in [url_.replace('/abs/', '/e-print/'), url_.replace('/abs/', '/src/')]:
for url_tar in [url_.replace('/abs/', '/src/'), url_.replace('/abs/', '/e-print/')]:
proxies = get_conf('proxies')
r = requests.get(url_tar, proxies=proxies)
if r.status_code == 200:
with open(dst, 'wb+') as f:
f.write(r.content)
return True
return False
if os.path.exists(dst) and allow_cache:
yield from update_ui_lastest_msg(f"调用缓存 {arxiv_id}", chatbot=chatbot, history=history) # 刷新界面
success = True
else:
yield from update_ui_lastest_msg(f"开始下载 {arxiv_id}", chatbot=chatbot, history=history) # 刷新界面
success = fix_url_and_download()
yield from update_ui_lastest_msg(f"下载完成 {arxiv_id}", chatbot=chatbot, history=history) # 刷新界面
if not success:
yield from update_ui_lastest_msg(f"下载失败 {arxiv_id}", chatbot=chatbot, history=history)
raise tarfile.ReadError(f"论文下载失败 {arxiv_id}")
# <-------------- extract file ------------->
from toolbox import extract_archive
try:
extract_archive(file_path=dst, dest_dir=extract_dst)
except tarfile.ReadError:
os.remove(dst)
raise tarfile.ReadError(f"论文下载失败")
return extract_dst, arxiv_id
def pdf2tex_project(pdf_file_path, plugin_kwargs):
if plugin_kwargs["method"] == "MATHPIX":
# Mathpix API credentials
app_id, app_key = get_conf('MATHPIX_APPID', 'MATHPIX_APPKEY')
headers = {"app_id": app_id, "app_key": app_key}
# Step 1: Send PDF file for processing
options = {
"conversion_formats": {"tex.zip": True},
"math_inline_delimiters": ["$", "$"],
"rm_spaces": True
}
response = requests.post(url="https://api.mathpix.com/v3/pdf",
headers=headers,
data={"options_json": json.dumps(options)},
files={"file": open(pdf_file_path, "rb")})
if response.ok:
pdf_id = response.json()["pdf_id"]
logger.info(f"PDF processing initiated. PDF ID: {pdf_id}")
# Step 2: Check processing status
while True:
conversion_response = requests.get(f"https://api.mathpix.com/v3/pdf/{pdf_id}", headers=headers)
conversion_data = conversion_response.json()
if conversion_data["status"] == "completed":
logger.info("PDF processing completed.")
break
elif conversion_data["status"] == "error":
logger.info("Error occurred during processing.")
else:
logger.info(f"Processing status: {conversion_data['status']}")
time.sleep(5) # wait for a few seconds before checking again
# Step 3: Save results to local files
output_dir = os.path.join(os.path.dirname(pdf_file_path), 'mathpix_output')
if not os.path.exists(output_dir):
os.makedirs(output_dir)
url = f"https://api.mathpix.com/v3/pdf/{pdf_id}.tex"
response = requests.get(url, headers=headers)
file_name_wo_dot = '_'.join(os.path.basename(pdf_file_path).split('.')[:-1])
output_name = f"{file_name_wo_dot}.tex.zip"
output_path = os.path.join(output_dir, output_name)
with open(output_path, "wb") as output_file:
output_file.write(response.content)
logger.info(f"tex.zip file saved at: {output_path}")
import zipfile
unzip_dir = os.path.join(output_dir, file_name_wo_dot)
with zipfile.ZipFile(output_path, 'r') as zip_ref:
zip_ref.extractall(unzip_dir)
return unzip_dir
else:
logger.error(f"Error sending PDF for processing. Status code: {response.status_code}")
return None
else:
from crazy_functions.pdf_fns.parse_pdf_via_doc2x import 解析PDF_DOC2X_转Latex
unzip_dir = 解析PDF_DOC2X_转Latex(pdf_file_path)
return unzip_dir
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 插件主程序1 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
@CatchException
def Latex英文纠错加PDF对比(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
# <-------------- information about this plugin ------------->
chatbot.append(["函数插件功能?",
"对整个Latex项目进行纠错, 用latex编译为PDF对修正处做高亮。函数插件贡献者: Binary-Husky。注意事项: 目前对机器学习类文献转化效果最好,其他类型文献转化效果未知。仅在Windows系统进行了测试,其他操作系统表现未知。"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# <-------------- more requirements ------------->
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
more_req = plugin_kwargs.get("advanced_arg", "")
_switch_prompt_ = partial(switch_prompt, more_requirement=more_req)
# <-------------- check deps ------------->
try:
import glob, os, time, subprocess
subprocess.Popen(['pdflatex', '-version'])
from .latex_fns.latex_actions import Latex精细分解与转化, 编译Latex
except Exception as e:
chatbot.append([f"解析项目: {txt}",
f"尝试执行Latex指令失败。Latex没有安装, 或者不在环境变量PATH中。安装方法https://tug.org/texlive/。报错信息\n\n```\n\n{trimmed_format_exc()}\n\n```\n\n"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# <-------------- clear history and read input ------------->
history = []
if os.path.exists(txt):
project_folder = txt
else:
if txt == "": txt = '空空如也的输入栏'
report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"找不到本地项目或无权访问: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
file_manifest = [f for f in glob.glob(f'{project_folder}/**/*.tex', recursive=True)]
if len(file_manifest) == 0:
report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"找不到任何.tex文件: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# <-------------- if is a zip/tar file ------------->
project_folder = desend_to_extracted_folder_if_exist(project_folder)
# <-------------- move latex project away from temp folder ------------->
from shared_utils.fastapi_server import validate_path_safety
validate_path_safety(project_folder, chatbot.get_user())
project_folder = move_project(project_folder, arxiv_id=None)
# <-------------- if merge_translate_zh is already generated, skip gpt req ------------->
if not os.path.exists(project_folder + '/merge_proofread_en.tex'):
yield from Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin_kwargs,
chatbot, history, system_prompt, mode='proofread_en',
switch_prompt=_switch_prompt_)
# <-------------- compile PDF ------------->
success = yield from 编译Latex(chatbot, history, main_file_original='merge',
main_file_modified='merge_proofread_en',
work_folder_original=project_folder, work_folder_modified=project_folder,
work_folder=project_folder)
# <-------------- zip PDF ------------->
zip_res = zip_result(project_folder)
if success:
chatbot.append((f"成功啦", '请查收结果(压缩包)...'))
yield from update_ui(chatbot=chatbot, history=history);
time.sleep(1) # 刷新界面
promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
else:
chatbot.append((f"失败了",
'虽然PDF生成失败了, 但请查收结果(压缩包), 内含已经翻译的Tex文档, 也是可读的, 您可以到Github Issue区, 用该压缩包+Conversation_To_File进行反馈 ...'))
yield from update_ui(chatbot=chatbot, history=history);
time.sleep(1) # 刷新界面
promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
# <-------------- we are done ------------->
return success
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 插件主程序2 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
@CatchException
def Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
# <-------------- information about this plugin ------------->
chatbot.append([
"函数插件功能?",
"对整个Latex项目进行翻译, 生成中文PDF。函数插件贡献者: Binary-Husky。注意事项: 此插件Windows支持最佳,Linux下必须使用Docker安装,详见项目主README.md。目前对机器学习类文献转化效果最好,其他类型文献转化效果未知。"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# <-------------- more requirements ------------->
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
more_req = plugin_kwargs.get("advanced_arg", "")
no_cache = ("--no-cache" in more_req)
if no_cache: more_req = more_req.replace("--no-cache", "").strip()
allow_gptac_cloud_io = ("--allow-cloudio" in more_req) # 从云端下载翻译结果,以及上传翻译结果到云端
if allow_gptac_cloud_io: more_req = more_req.replace("--allow-cloudio", "").strip()
allow_cache = not no_cache
_switch_prompt_ = partial(switch_prompt, more_requirement=more_req)
# <-------------- check deps ------------->
try:
import glob, os, time, subprocess
subprocess.Popen(['pdflatex', '-version'])
from .latex_fns.latex_actions import Latex精细分解与转化, 编译Latex
except Exception as e:
chatbot.append([f"解析项目: {txt}",
f"尝试执行Latex指令失败。Latex没有安装, 或者不在环境变量PATH中。安装方法https://tug.org/texlive/。报错信息\n\n```\n\n{trimmed_format_exc()}\n\n```\n\n"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# <-------------- clear history and read input ------------->
history = []
try:
txt, arxiv_id = yield from arxiv_download(chatbot, history, txt, allow_cache)
except tarfile.ReadError as e:
yield from update_ui_lastest_msg(
"无法自动下载该论文的Latex源码,请前往arxiv打开此论文下载页面,点other Formats,然后download source手动下载latex源码包。接下来调用本地Latex翻译插件即可。",
chatbot=chatbot, history=history)
return
if txt.endswith('.pdf'):
report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"发现已经存在翻译好的PDF文档")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# #################################################################
if allow_gptac_cloud_io and arxiv_id:
# 访问 GPTAC学术云,查询云端是否存在该论文的翻译版本
from crazy_functions.latex_fns.latex_actions import check_gptac_cloud
success, downloaded = check_gptac_cloud(arxiv_id, chatbot)
if success:
chatbot.append([
f"检测到GPTAC云端存在翻译版本, 如果不满意翻译结果, 请禁用云端分享, 然后重新执行。",
None
])
yield from update_ui(chatbot=chatbot, history=history)
return
#################################################################
if os.path.exists(txt):
project_folder = txt
else:
if txt == "": txt = '空空如也的输入栏'
report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"找不到本地项目或无法处理: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
file_manifest = [f for f in glob.glob(f'{project_folder}/**/*.tex', recursive=True)]
if len(file_manifest) == 0:
report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"找不到任何.tex文件: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# <-------------- if is a zip/tar file ------------->
project_folder = desend_to_extracted_folder_if_exist(project_folder)
# <-------------- move latex project away from temp folder ------------->
from shared_utils.fastapi_server import validate_path_safety
validate_path_safety(project_folder, chatbot.get_user())
project_folder = move_project(project_folder, arxiv_id)
# <-------------- if merge_translate_zh is already generated, skip gpt req ------------->
if not os.path.exists(project_folder + '/merge_translate_zh.tex'):
yield from Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin_kwargs,
chatbot, history, system_prompt, mode='translate_zh',
switch_prompt=_switch_prompt_)
# <-------------- compile PDF ------------->
success = yield from 编译Latex(chatbot, history, main_file_original='merge',
main_file_modified='merge_translate_zh', mode='translate_zh',
work_folder_original=project_folder, work_folder_modified=project_folder,
work_folder=project_folder)
# <-------------- zip PDF ------------->
zip_res = zip_result(project_folder)
if success:
if allow_gptac_cloud_io and arxiv_id:
# 如果用户允许,我们将翻译好的arxiv论文PDF上传到GPTAC学术云
from crazy_functions.latex_fns.latex_actions import upload_to_gptac_cloud_if_user_allow
threading.Thread(target=upload_to_gptac_cloud_if_user_allow,
args=(chatbot, arxiv_id), daemon=True).start()
chatbot.append((f"成功啦", '请查收结果(压缩包)...'))
yield from update_ui(chatbot=chatbot, history=history)
time.sleep(1) # 刷新界面
promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
else:
chatbot.append((f"失败了",
'虽然PDF生成失败了, 但请查收结果(压缩包), 内含已经翻译的Tex文档, 您可以到Github Issue区, 用该压缩包进行反馈。如系统是Linux,请检查系统字体见Github wiki ...'))
yield from update_ui(chatbot=chatbot, history=history)
time.sleep(1) # 刷新界面
promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
# <-------------- we are done ------------->
return success
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- 插件主程序3 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
@CatchException
def PDF翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
# <-------------- information about this plugin ------------->
chatbot.append([
"函数插件功能?",
"将PDF转换为Latex项目,翻译为中文后重新编译为PDF。函数插件贡献者: Marroh。注意事项: 此插件Windows支持最佳,Linux下必须使用Docker安装,详见项目主README.md。目前对机器学习类文献转化效果最好,其他类型文献转化效果未知。"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# <-------------- more requirements ------------->
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
more_req = plugin_kwargs.get("advanced_arg", "")
no_cache = more_req.startswith("--no-cache")
if no_cache: more_req.lstrip("--no-cache")
allow_cache = not no_cache
_switch_prompt_ = partial(switch_prompt, more_requirement=more_req)
# <-------------- check deps ------------->
try:
import glob, os, time, subprocess
subprocess.Popen(['pdflatex', '-version'])
from .latex_fns.latex_actions import Latex精细分解与转化, 编译Latex
except Exception as e:
chatbot.append([f"解析项目: {txt}",
f"尝试执行Latex指令失败。Latex没有安装, 或者不在环境变量PATH中。安装方法https://tug.org/texlive/。报错信息\n\n```\n\n{trimmed_format_exc()}\n\n```\n\n"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# <-------------- clear history and read input ------------->
if os.path.exists(txt):
project_folder = txt
else:
if txt == "": txt = '空空如也的输入栏'
report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"找不到本地项目或无法处理: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
file_manifest = [f for f in glob.glob(f'{project_folder}/**/*.pdf', recursive=True)]
if len(file_manifest) == 0:
report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"找不到任何.pdf文件: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
if len(file_manifest) != 1:
report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"不支持同时处理多个pdf文件: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
if plugin_kwargs.get("method", "") == 'MATHPIX':
app_id, app_key = get_conf('MATHPIX_APPID', 'MATHPIX_APPKEY')
if len(app_id) == 0 or len(app_key) == 0:
report_exception(chatbot, history, a="缺失 MATHPIX_APPID 和 MATHPIX_APPKEY。", b=f"请配置 MATHPIX_APPID 和 MATHPIX_APPKEY")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
if plugin_kwargs.get("method", "") == 'DOC2X':
app_id, app_key = "", ""
DOC2X_API_KEY = get_conf('DOC2X_API_KEY')
if len(DOC2X_API_KEY) == 0:
report_exception(chatbot, history, a="缺失 DOC2X_API_KEY。", b=f"请配置 DOC2X_API_KEY")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
hash_tag = map_file_to_sha256(file_manifest[0])
# # <-------------- check repeated pdf ------------->
# chatbot.append([f"检查PDF是否被重复上传", "正在检查..."])
# yield from update_ui(chatbot=chatbot, history=history)
# repeat, project_folder = check_repeat_upload(file_manifest[0], hash_tag)
# if repeat:
# yield from update_ui_lastest_msg(f"发现重复上传,请查收结果(压缩包)...", chatbot=chatbot, history=history)
# try:
# translate_pdf = [f for f in glob.glob(f'{project_folder}/**/merge_translate_zh.pdf', recursive=True)][0]
# promote_file_to_downloadzone(translate_pdf, rename_file=None, chatbot=chatbot)
# comparison_pdf = [f for f in glob.glob(f'{project_folder}/**/comparison.pdf', recursive=True)][0]
# promote_file_to_downloadzone(comparison_pdf, rename_file=None, chatbot=chatbot)
# zip_res = zip_result(project_folder)
# promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
# return
# except:
# report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"发现重复上传,但是无法找到相关文件")
# yield from update_ui(chatbot=chatbot, history=history)
# else:
# yield from update_ui_lastest_msg(f"未发现重复上传", chatbot=chatbot, history=history)
# <-------------- convert pdf into tex ------------->
chatbot.append([f"解析项目: {txt}", "正在将PDF转换为tex项目,请耐心等待..."])
yield from update_ui(chatbot=chatbot, history=history)
project_folder = pdf2tex_project(file_manifest[0], plugin_kwargs)
if project_folder is None:
report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"PDF转换为tex项目失败")
yield from update_ui(chatbot=chatbot, history=history)
return False
# <-------------- translate latex file into Chinese ------------->
yield from update_ui_lastest_msg("正在tex项目将翻译为中文...", chatbot=chatbot, history=history)
file_manifest = [f for f in glob.glob(f'{project_folder}/**/*.tex', recursive=True)]
if len(file_manifest) == 0:
report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"找不到任何.tex文件: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# <-------------- if is a zip/tar file ------------->
project_folder = desend_to_extracted_folder_if_exist(project_folder)
# <-------------- move latex project away from temp folder ------------->
from shared_utils.fastapi_server import validate_path_safety
validate_path_safety(project_folder, chatbot.get_user())
project_folder = move_project(project_folder)
# <-------------- set a hash tag for repeat-checking ------------->
with open(pj(project_folder, hash_tag + '.tag'), 'w') as f:
f.write(hash_tag)
f.close()
# <-------------- if merge_translate_zh is already generated, skip gpt req ------------->
if not os.path.exists(project_folder + '/merge_translate_zh.tex'):
yield from Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin_kwargs,
chatbot, history, system_prompt, mode='translate_zh',
switch_prompt=_switch_prompt_)
# <-------------- compile PDF ------------->
yield from update_ui_lastest_msg("正在将翻译好的项目tex项目编译为PDF...", chatbot=chatbot, history=history)
success = yield from 编译Latex(chatbot, history, main_file_original='merge',
main_file_modified='merge_translate_zh', mode='translate_zh',
work_folder_original=project_folder, work_folder_modified=project_folder,
work_folder=project_folder)
# <-------------- zip PDF ------------->
zip_res = zip_result(project_folder)
if success:
chatbot.append((f"成功啦", '请查收结果(压缩包)...'))
yield from update_ui(chatbot=chatbot, history=history);
time.sleep(1) # 刷新界面
promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
else:
chatbot.append((f"失败了",
'虽然PDF生成失败了, 但请查收结果(压缩包), 内含已经翻译的Tex文档, 您可以到Github Issue区, 用该压缩包进行反馈。如系统是Linux,请检查系统字体见Github wiki ...'))
yield from update_ui(chatbot=chatbot, history=history);
time.sleep(1) # 刷新界面
promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
# <-------------- we are done ------------->
return success

查看文件

@@ -0,0 +1,85 @@
from crazy_functions.Latex_Function import Latex翻译中文并重新编译PDF, PDF翻译中文并重新编译PDF
from crazy_functions.plugin_template.plugin_class_template import GptAcademicPluginTemplate, ArgProperty
class Arxiv_Localize(GptAcademicPluginTemplate):
def __init__(self):
"""
请注意`execute`会执行在不同的线程中,因此您在定义和使用类变量时,应当慎之又慎!
"""
pass
def define_arg_selection_menu(self):
"""
定义插件的二级选项菜单
第一个参数,名称`main_input`,参数`type`声明这是一个文本框,文本框上方显示`title`,文本框内部显示`description`,`default_value`为默认值;
第二个参数,名称`advanced_arg`,参数`type`声明这是一个文本框,文本框上方显示`title`,文本框内部显示`description`,`default_value`为默认值;
第三个参数,名称`allow_cache`,参数`type`声明这是一个下拉菜单,下拉菜单上方显示`title`+`description`,下拉菜单的选项为`options`,`default_value`为下拉菜单默认值;
"""
gui_definition = {
"main_input":
ArgProperty(title="ArxivID", description="输入Arxiv的ID或者网址", default_value="", type="string").model_dump_json(), # 主输入,自动从输入框同步
"advanced_arg":
ArgProperty(title="额外的翻译提示词",
description=r"如果有必要, 请在此处给出自定义翻译命令, 解决部分词汇翻译不准确的问题。 "
r"例如当单词'agent'翻译不准确时, 请尝试把以下指令复制到高级参数区: "
r'If the term "agent" is used in this section, it should be translated to "智能体". ',
default_value="", type="string").model_dump_json(), # 高级参数输入区,自动同步
"allow_cache":
ArgProperty(title="是否允许从缓存中调取结果", options=["允许缓存", "从头执行"], default_value="允许缓存", description="", type="dropdown").model_dump_json(),
"allow_cloudio":
ArgProperty(title="是否允许从GPTAC学术云下载(或者上传)翻译结果(仅针对Arxiv论文)", options=["允许", "禁止"], default_value="禁止", description="共享文献,互助互利", type="dropdown").model_dump_json(),
}
return gui_definition
def execute(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
"""
执行插件
"""
allow_cache = plugin_kwargs["allow_cache"]
allow_cloudio = plugin_kwargs["allow_cloudio"]
advanced_arg = plugin_kwargs["advanced_arg"]
if allow_cache == "从头执行": plugin_kwargs["advanced_arg"] = "--no-cache " + plugin_kwargs["advanced_arg"]
# 从云端下载翻译结果,以及上传翻译结果到云端;人人为我,我为人人。
if allow_cloudio == "允许": plugin_kwargs["advanced_arg"] = "--allow-cloudio " + plugin_kwargs["advanced_arg"]
yield from Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request)
class PDF_Localize(GptAcademicPluginTemplate):
def __init__(self):
"""
请注意`execute`会执行在不同的线程中,因此您在定义和使用类变量时,应当慎之又慎!
"""
pass
def define_arg_selection_menu(self):
"""
定义插件的二级选项菜单
"""
gui_definition = {
"main_input":
ArgProperty(title="PDF文件路径", description="未指定路径,请上传文件后,再点击该插件", default_value="", type="string").model_dump_json(), # 主输入,自动从输入框同步
"advanced_arg":
ArgProperty(title="额外的翻译提示词",
description=r"如果有必要, 请在此处给出自定义翻译命令, 解决部分词汇翻译不准确的问题。 "
r"例如当单词'agent'翻译不准确时, 请尝试把以下指令复制到高级参数区: "
r'If the term "agent" is used in this section, it should be translated to "智能体". ',
default_value="", type="string").model_dump_json(), # 高级参数输入区,自动同步
"method":
ArgProperty(title="采用哪种方法执行转换", options=["MATHPIX", "DOC2X"], default_value="DOC2X", description="", type="dropdown").model_dump_json(),
}
return gui_definition
def execute(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
"""
执行插件
"""
yield from PDF翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request)

查看文件

@@ -1,6 +1,6 @@
from toolbox import update_ui, trimmed_format_exc, promote_file_to_downloadzone, get_log_folder
from toolbox import CatchException, report_exception, write_history_to_file, zip_folder
from loguru import logger
class PaperFileGroup():
def __init__(self):
@@ -33,7 +33,7 @@ class PaperFileGroup():
self.sp_file_index.append(index)
self.sp_file_tag.append(self.file_paths[index] + f".part-{j}.tex")
print('Segmentation: done')
logger.info('Segmentation: done')
def merge_result(self):
self.file_result = ["" for _ in range(len(self.file_paths))]
for r, k in zip(self.sp_file_result, self.sp_file_index):
@@ -56,7 +56,7 @@ class PaperFileGroup():
def 多文件润色(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, language='en', mode='polish'):
import time, os, re
from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
from crazy_functions.crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
# <-------- 读取Latex文件,删除其中的所有注释 ---------->
@@ -81,8 +81,8 @@ def 多文件润色(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
# <-------- 多线程润色开始 ---------->
if language == 'en':
if mode == 'polish':
inputs_array = ["Below is a section from an academic paper, polish this section to meet the academic standard, " +
"improve the grammar, clarity and overall readability, do not modify any latex command such as \section, \cite and equations:" +
inputs_array = [r"Below is a section from an academic paper, polish this section to meet the academic standard, " +
r"improve the grammar, clarity and overall readability, do not modify any latex command such as \section, \cite and equations:" +
f"\n\n{frag}" for frag in pfg.sp_file_contents]
else:
inputs_array = [r"Below is a section from an academic paper, proofread this section." +
@@ -93,10 +93,10 @@ def 多文件润色(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
sys_prompt_array = ["You are a professional academic paper writer." for _ in range(n_split)]
elif language == 'zh':
if mode == 'polish':
inputs_array = [f"以下是一篇学术论文中的一段内容,请将此部分润色以满足学术标准,提高语法、清晰度和整体可读性,不要修改任何LaTeX命令,例如\section,\cite和方程式" +
inputs_array = [r"以下是一篇学术论文中的一段内容,请将此部分润色以满足学术标准,提高语法、清晰度和整体可读性,不要修改任何LaTeX命令,例如\section,\cite和方程式" +
f"\n\n{frag}" for frag in pfg.sp_file_contents]
else:
inputs_array = [f"以下是一篇学术论文中的一段内容,请对这部分内容进行语法矫正。不要修改任何LaTeX命令,例如\section,\cite和方程式" +
inputs_array = [r"以下是一篇学术论文中的一段内容,请对这部分内容进行语法矫正。不要修改任何LaTeX命令,例如\section,\cite和方程式" +
f"\n\n{frag}" for frag in pfg.sp_file_contents]
inputs_show_user_array = [f"润色 {f}" for f in pfg.sp_file_tag]
sys_prompt_array=["你是一位专业的中文学术论文作家。" for _ in range(n_split)]
@@ -122,7 +122,7 @@ def 多文件润色(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
pfg.write_result()
pfg.zip_result()
except:
print(trimmed_format_exc())
logger.error(trimmed_format_exc())
# <-------- 整理结果,退出 ---------->
create_report_file_name = time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime()) + f"-chatgpt.polish.md"

查看文件

@@ -1,6 +1,6 @@
from toolbox import update_ui, promote_file_to_downloadzone
from toolbox import CatchException, report_exception, write_history_to_file
fast_debug = False
from loguru import logger
class PaperFileGroup():
def __init__(self):
@@ -33,11 +33,11 @@ class PaperFileGroup():
self.sp_file_index.append(index)
self.sp_file_tag.append(self.file_paths[index] + f".part-{j}.tex")
print('Segmentation: done')
logger.info('Segmentation: done')
def 多文件翻译(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, language='en'):
import time, os, re
from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
from crazy_functions.crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
# <-------- 读取Latex文件,删除其中的所有注释 ---------->
pfg = PaperFileGroup()

查看文件

@@ -1,313 +0,0 @@
from toolbox import update_ui, trimmed_format_exc, get_conf, get_log_folder, promote_file_to_downloadzone
from toolbox import CatchException, report_exception, update_ui_lastest_msg, zip_result, gen_time_str
from functools import partial
import glob, os, requests, time, tarfile
pj = os.path.join
ARXIV_CACHE_DIR = os.path.expanduser(f"~/arxiv_cache/")
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- 工具函数 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
# 专业词汇声明 = 'If the term "agent" is used in this section, it should be translated to "智能体". '
def switch_prompt(pfg, mode, more_requirement):
"""
Generate prompts and system prompts based on the mode for proofreading or translating.
Args:
- pfg: Proofreader or Translator instance.
- mode: A string specifying the mode, either 'proofread' or 'translate_zh'.
Returns:
- inputs_array: A list of strings containing prompts for users to respond to.
- sys_prompt_array: A list of strings containing prompts for system prompts.
"""
n_split = len(pfg.sp_file_contents)
if mode == 'proofread_en':
inputs_array = [r"Below is a section from an academic paper, proofread this section." +
r"Do not modify any latex command such as \section, \cite, \begin, \item and equations. " + more_requirement +
r"Answer me only with the revised text:" +
f"\n\n{frag}" for frag in pfg.sp_file_contents]
sys_prompt_array = ["You are a professional academic paper writer." for _ in range(n_split)]
elif mode == 'translate_zh':
inputs_array = [r"Below is a section from an English academic paper, translate it into Chinese. " + more_requirement +
r"Do not modify any latex command such as \section, \cite, \begin, \item and equations. " +
r"Answer me only with the translated text:" +
f"\n\n{frag}" for frag in pfg.sp_file_contents]
sys_prompt_array = ["You are a professional translator." for _ in range(n_split)]
else:
assert False, "未知指令"
return inputs_array, sys_prompt_array
def desend_to_extracted_folder_if_exist(project_folder):
"""
Descend into the extracted folder if it exists, otherwise return the original folder.
Args:
- project_folder: A string specifying the folder path.
Returns:
- A string specifying the path to the extracted folder, or the original folder if there is no extracted folder.
"""
maybe_dir = [f for f in glob.glob(f'{project_folder}/*') if os.path.isdir(f)]
if len(maybe_dir) == 0: return project_folder
if maybe_dir[0].endswith('.extract'): return maybe_dir[0]
return project_folder
def move_project(project_folder, arxiv_id=None):
"""
Create a new work folder and copy the project folder to it.
Args:
- project_folder: A string specifying the folder path of the project.
Returns:
- A string specifying the path to the new work folder.
"""
import shutil, time
time.sleep(2) # avoid time string conflict
if arxiv_id is not None:
new_workfolder = pj(ARXIV_CACHE_DIR, arxiv_id, 'workfolder')
else:
new_workfolder = f'{get_log_folder()}/{gen_time_str()}'
try:
shutil.rmtree(new_workfolder)
except:
pass
# align subfolder if there is a folder wrapper
items = glob.glob(pj(project_folder,'*'))
items = [item for item in items if os.path.basename(item)!='__MACOSX']
if len(glob.glob(pj(project_folder,'*.tex'))) == 0 and len(items) == 1:
if os.path.isdir(items[0]): project_folder = items[0]
shutil.copytree(src=project_folder, dst=new_workfolder)
return new_workfolder
def arxiv_download(chatbot, history, txt, allow_cache=True):
def check_cached_translation_pdf(arxiv_id):
translation_dir = pj(ARXIV_CACHE_DIR, arxiv_id, 'translation')
if not os.path.exists(translation_dir):
os.makedirs(translation_dir)
target_file = pj(translation_dir, 'translate_zh.pdf')
if os.path.exists(target_file):
promote_file_to_downloadzone(target_file, rename_file=None, chatbot=chatbot)
target_file_compare = pj(translation_dir, 'comparison.pdf')
if os.path.exists(target_file_compare):
promote_file_to_downloadzone(target_file_compare, rename_file=None, chatbot=chatbot)
return target_file
return False
def is_float(s):
try:
float(s)
return True
except ValueError:
return False
if ('.' in txt) and ('/' not in txt) and is_float(txt): # is arxiv ID
txt = 'https://arxiv.org/abs/' + txt.strip()
if ('.' in txt) and ('/' not in txt) and is_float(txt[:10]): # is arxiv ID
txt = 'https://arxiv.org/abs/' + txt[:10]
if not txt.startswith('https://arxiv.org'):
return txt, None # 是本地文件,跳过下载
# <-------------- inspect format ------------->
chatbot.append([f"检测到arxiv文档连接", '尝试下载 ...'])
yield from update_ui(chatbot=chatbot, history=history)
time.sleep(1) # 刷新界面
url_ = txt # https://arxiv.org/abs/1707.06690
if not txt.startswith('https://arxiv.org/abs/'):
msg = f"解析arxiv网址失败, 期望格式例如: https://arxiv.org/abs/1707.06690。实际得到格式: {url_}"
yield from update_ui_lastest_msg(msg, chatbot=chatbot, history=history) # 刷新界面
return msg, None
# <-------------- set format ------------->
arxiv_id = url_.split('/abs/')[-1]
if 'v' in arxiv_id: arxiv_id = arxiv_id[:10]
cached_translation_pdf = check_cached_translation_pdf(arxiv_id)
if cached_translation_pdf and allow_cache: return cached_translation_pdf, arxiv_id
url_tar = url_.replace('/abs/', '/e-print/')
translation_dir = pj(ARXIV_CACHE_DIR, arxiv_id, 'e-print')
extract_dst = pj(ARXIV_CACHE_DIR, arxiv_id, 'extract')
os.makedirs(translation_dir, exist_ok=True)
# <-------------- download arxiv source file ------------->
dst = pj(translation_dir, arxiv_id+'.tar')
if os.path.exists(dst):
yield from update_ui_lastest_msg("调用缓存", chatbot=chatbot, history=history) # 刷新界面
else:
yield from update_ui_lastest_msg("开始下载", chatbot=chatbot, history=history) # 刷新界面
proxies = get_conf('proxies')
r = requests.get(url_tar, proxies=proxies)
with open(dst, 'wb+') as f:
f.write(r.content)
# <-------------- extract file ------------->
yield from update_ui_lastest_msg("下载完成", chatbot=chatbot, history=history) # 刷新界面
from toolbox import extract_archive
extract_archive(file_path=dst, dest_dir=extract_dst)
return extract_dst, arxiv_id
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 插件主程序1 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
@CatchException
def Latex英文纠错加PDF对比(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
# <-------------- information about this plugin ------------->
chatbot.append([ "函数插件功能?",
"对整个Latex项目进行纠错, 用latex编译为PDF对修正处做高亮。函数插件贡献者: Binary-Husky。注意事项: 目前仅支持GPT3.5/GPT4,其他模型转化效果未知。目前对机器学习类文献转化效果最好,其他类型文献转化效果未知。仅在Windows系统进行了测试,其他操作系统表现未知。"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# <-------------- more requirements ------------->
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
more_req = plugin_kwargs.get("advanced_arg", "")
_switch_prompt_ = partial(switch_prompt, more_requirement=more_req)
# <-------------- check deps ------------->
try:
import glob, os, time, subprocess
subprocess.Popen(['pdflatex', '-version'])
from .latex_fns.latex_actions import Latex精细分解与转化, 编译Latex
except Exception as e:
chatbot.append([ f"解析项目: {txt}",
f"尝试执行Latex指令失败。Latex没有安装, 或者不在环境变量PATH中。安装方法https://tug.org/texlive/。报错信息\n\n```\n\n{trimmed_format_exc()}\n\n```\n\n"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# <-------------- clear history and read input ------------->
history = []
if os.path.exists(txt):
project_folder = txt
else:
if txt == "": txt = '空空如也的输入栏'
report_exception(chatbot, history, a = f"解析项目: {txt}", b = f"找不到本地项目或无权访问: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
file_manifest = [f for f in glob.glob(f'{project_folder}/**/*.tex', recursive=True)]
if len(file_manifest) == 0:
report_exception(chatbot, history, a = f"解析项目: {txt}", b = f"找不到任何.tex文件: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# <-------------- if is a zip/tar file ------------->
project_folder = desend_to_extracted_folder_if_exist(project_folder)
# <-------------- move latex project away from temp folder ------------->
project_folder = move_project(project_folder, arxiv_id=None)
# <-------------- if merge_translate_zh is already generated, skip gpt req ------------->
if not os.path.exists(project_folder + '/merge_proofread_en.tex'):
yield from Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin_kwargs,
chatbot, history, system_prompt, mode='proofread_en', switch_prompt=_switch_prompt_)
# <-------------- compile PDF ------------->
success = yield from 编译Latex(chatbot, history, main_file_original='merge', main_file_modified='merge_proofread_en',
work_folder_original=project_folder, work_folder_modified=project_folder, work_folder=project_folder)
# <-------------- zip PDF ------------->
zip_res = zip_result(project_folder)
if success:
chatbot.append((f"成功啦", '请查收结果(压缩包)...'))
yield from update_ui(chatbot=chatbot, history=history); time.sleep(1) # 刷新界面
promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
else:
chatbot.append((f"失败了", '虽然PDF生成失败了, 但请查收结果(压缩包), 内含已经翻译的Tex文档, 也是可读的, 您可以到Github Issue区, 用该压缩包+对话历史存档进行反馈 ...'))
yield from update_ui(chatbot=chatbot, history=history); time.sleep(1) # 刷新界面
promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
# <-------------- we are done ------------->
return success
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 插件主程序2 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
@CatchException
def Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
# <-------------- information about this plugin ------------->
chatbot.append([
"函数插件功能?",
"对整个Latex项目进行翻译, 生成中文PDF。函数插件贡献者: Binary-Husky。注意事项: 此插件Windows支持最佳,Linux下必须使用Docker安装,详见项目主README.md。目前仅支持GPT3.5/GPT4,其他模型转化效果未知。目前对机器学习类文献转化效果最好,其他类型文献转化效果未知。"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# <-------------- more requirements ------------->
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
more_req = plugin_kwargs.get("advanced_arg", "")
no_cache = more_req.startswith("--no-cache")
if no_cache: more_req.lstrip("--no-cache")
allow_cache = not no_cache
_switch_prompt_ = partial(switch_prompt, more_requirement=more_req)
# <-------------- check deps ------------->
try:
import glob, os, time, subprocess
subprocess.Popen(['pdflatex', '-version'])
from .latex_fns.latex_actions import Latex精细分解与转化, 编译Latex
except Exception as e:
chatbot.append([ f"解析项目: {txt}",
f"尝试执行Latex指令失败。Latex没有安装, 或者不在环境变量PATH中。安装方法https://tug.org/texlive/。报错信息\n\n```\n\n{trimmed_format_exc()}\n\n```\n\n"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# <-------------- clear history and read input ------------->
history = []
try:
txt, arxiv_id = yield from arxiv_download(chatbot, history, txt, allow_cache)
except tarfile.ReadError as e:
yield from update_ui_lastest_msg(
"无法自动下载该论文的Latex源码,请前往arxiv打开此论文下载页面,点other Formats,然后download source手动下载latex源码包。接下来调用本地Latex翻译插件即可。",
chatbot=chatbot, history=history)
return
if txt.endswith('.pdf'):
report_exception(chatbot, history, a = f"解析项目: {txt}", b = f"发现已经存在翻译好的PDF文档")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
if os.path.exists(txt):
project_folder = txt
else:
if txt == "": txt = '空空如也的输入栏'
report_exception(chatbot, history, a = f"解析项目: {txt}", b = f"找不到本地项目或无法处理: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
file_manifest = [f for f in glob.glob(f'{project_folder}/**/*.tex', recursive=True)]
if len(file_manifest) == 0:
report_exception(chatbot, history, a = f"解析项目: {txt}", b = f"找不到任何.tex文件: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# <-------------- if is a zip/tar file ------------->
project_folder = desend_to_extracted_folder_if_exist(project_folder)
# <-------------- move latex project away from temp folder ------------->
project_folder = move_project(project_folder, arxiv_id)
# <-------------- if merge_translate_zh is already generated, skip gpt req ------------->
if not os.path.exists(project_folder + '/merge_translate_zh.tex'):
yield from Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin_kwargs,
chatbot, history, system_prompt, mode='translate_zh', switch_prompt=_switch_prompt_)
# <-------------- compile PDF ------------->
success = yield from 编译Latex(chatbot, history, main_file_original='merge', main_file_modified='merge_translate_zh', mode='translate_zh',
work_folder_original=project_folder, work_folder_modified=project_folder, work_folder=project_folder)
# <-------------- zip PDF ------------->
zip_res = zip_result(project_folder)
if success:
chatbot.append((f"成功啦", '请查收结果(压缩包)...'))
yield from update_ui(chatbot=chatbot, history=history); time.sleep(1) # 刷新界面
promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
else:
chatbot.append((f"失败了", '虽然PDF生成失败了, 但请查收结果(压缩包), 内含已经翻译的Tex文档, 您可以到Github Issue区, 用该压缩包进行反馈。如系统是Linux,请检查系统字体见Github wiki ...'))
yield from update_ui(chatbot=chatbot, history=history); time.sleep(1) # 刷新界面
promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
# <-------------- we are done ------------->
return success

查看文件

@@ -1,5 +1,6 @@
import glob, time, os, re, logging
from toolbox import update_ui, trimmed_format_exc, gen_time_str, disable_auto_promotion
import glob, shutil, os, re
from loguru import logger
from toolbox import update_ui, trimmed_format_exc, gen_time_str
from toolbox import CatchException, report_exception, get_log_folder
from toolbox import write_history_to_file, promote_file_to_downloadzone
fast_debug = False
@@ -18,7 +19,7 @@ class PaperFileGroup():
def get_token_num(txt): return len(enc.encode(txt, disallowed_special=()))
self.get_token_num = get_token_num
def run_file_split(self, max_token_limit=1900):
def run_file_split(self, max_token_limit=2048):
"""
将长文本分离开来
"""
@@ -34,7 +35,7 @@ class PaperFileGroup():
self.sp_file_contents.append(segment)
self.sp_file_index.append(index)
self.sp_file_tag.append(self.file_paths[index] + f".part-{j}.md")
logging.info('Segmentation: done')
logger.info('Segmentation: done')
def merge_result(self):
self.file_result = ["" for _ in range(len(self.file_paths))]
@@ -51,7 +52,7 @@ class PaperFileGroup():
return manifest
def 多文件翻译(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, language='en'):
from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
from crazy_functions.crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
# <-------- 读取Markdown文件,删除其中的所有注释 ---------->
pfg = PaperFileGroup()
@@ -64,25 +65,25 @@ def 多文件翻译(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
pfg.file_contents.append(file_content)
# <-------- 拆分过长的Markdown文件 ---------->
pfg.run_file_split(max_token_limit=1500)
pfg.run_file_split(max_token_limit=1024)
n_split = len(pfg.sp_file_contents)
# <-------- 多线程翻译开始 ---------->
if language == 'en->zh':
inputs_array = ["This is a Markdown file, translate it into Chinese, do not modify any existing Markdown commands:" +
inputs_array = ["This is a Markdown file, translate it into Chinese, do NOT modify any existing Markdown commands, do NOT use code wrapper (```), ONLY answer me with translated results:" +
f"\n\n{frag}" for frag in pfg.sp_file_contents]
inputs_show_user_array = [f"翻译 {f}" for f in pfg.sp_file_tag]
sys_prompt_array = ["You are a professional academic paper translator." for _ in range(n_split)]
sys_prompt_array = ["You are a professional academic paper translator." + plugin_kwargs.get("additional_prompt", "") for _ in range(n_split)]
elif language == 'zh->en':
inputs_array = [f"This is a Markdown file, translate it into English, do not modify any existing Markdown commands:" +
inputs_array = [f"This is a Markdown file, translate it into English, do NOT modify any existing Markdown commands, do NOT use code wrapper (```), ONLY answer me with translated results:" +
f"\n\n{frag}" for frag in pfg.sp_file_contents]
inputs_show_user_array = [f"翻译 {f}" for f in pfg.sp_file_tag]
sys_prompt_array = ["You are a professional academic paper translator." for _ in range(n_split)]
sys_prompt_array = ["You are a professional academic paper translator." + plugin_kwargs.get("additional_prompt", "") for _ in range(n_split)]
else:
inputs_array = [f"This is a Markdown file, translate it into {language}, do not modify any existing Markdown commands, only answer me with translated results:" +
inputs_array = [f"This is a Markdown file, translate it into {language}, do NOT modify any existing Markdown commands, do NOT use code wrapper (```), ONLY answer me with translated results:" +
f"\n\n{frag}" for frag in pfg.sp_file_contents]
inputs_show_user_array = [f"翻译 {f}" for f in pfg.sp_file_tag]
sys_prompt_array = ["You are a professional academic paper translator." for _ in range(n_split)]
sys_prompt_array = ["You are a professional academic paper translator." + plugin_kwargs.get("additional_prompt", "") for _ in range(n_split)]
gpt_response_collection = yield from request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
inputs_array=inputs_array,
@@ -99,9 +100,14 @@ def 多文件翻译(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
for i_say, gpt_say in zip(gpt_response_collection[0::2], gpt_response_collection[1::2]):
pfg.sp_file_result.append(gpt_say)
pfg.merge_result()
pfg.write_result(language)
output_file_arr = pfg.write_result(language)
for output_file in output_file_arr:
promote_file_to_downloadzone(output_file, chatbot=chatbot)
if 'markdown_expected_output_path' in plugin_kwargs:
expected_f_name = plugin_kwargs['markdown_expected_output_path']
shutil.copyfile(output_file, expected_f_name)
except:
logging.error(trimmed_format_exc())
logger.error(trimmed_format_exc())
# <-------- 整理结果,退出 ---------->
create_report_file_name = gen_time_str() + f"-chatgpt.md"
@@ -121,7 +127,7 @@ def get_files_from_everything(txt, preference=''):
proxies = get_conf('proxies')
# 网络的远程文件
if preference == 'Github':
logging.info('正在从github下载资源 ...')
logger.info('正在从github下载资源 ...')
if not txt.endswith('.md'):
# Make a request to the GitHub API to retrieve the repository information
url = txt.replace("https://github.com/", "https://api.github.com/repos/") + '/readme'
@@ -159,7 +165,6 @@ def Markdown英译中(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_p
"函数插件功能?",
"对整个Markdown项目进行翻译。函数插件贡献者: Binary-Husky"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
disable_auto_promotion(chatbot)
# 尝试导入依赖,如果缺少依赖,则给出安装建议
try:
@@ -199,7 +204,6 @@ def Markdown中译英(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_p
"函数插件功能?",
"对整个Markdown项目进行翻译。函数插件贡献者: Binary-Husky"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
disable_auto_promotion(chatbot)
# 尝试导入依赖,如果缺少依赖,则给出安装建议
try:
@@ -232,7 +236,6 @@ def Markdown翻译指定语言(txt, llm_kwargs, plugin_kwargs, chatbot, history,
"函数插件功能?",
"对整个Markdown项目进行翻译。函数插件贡献者: Binary-Husky"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
disable_auto_promotion(chatbot)
# 尝试导入依赖,如果缺少依赖,则给出安装建议
try:

查看文件

@@ -0,0 +1,83 @@
from toolbox import CatchException, check_packages, get_conf
from toolbox import update_ui, update_ui_lastest_msg, disable_auto_promotion
from toolbox import trimmed_format_exc_markdown
from crazy_functions.crazy_utils import get_files_from_everything
from crazy_functions.pdf_fns.parse_pdf import get_avail_grobid_url
from crazy_functions.pdf_fns.parse_pdf_via_doc2x import 解析PDF_基于DOC2X
from crazy_functions.pdf_fns.parse_pdf_legacy import 解析PDF_简单拆解
from crazy_functions.pdf_fns.parse_pdf_grobid import 解析PDF_基于GROBID
from shared_utils.colorful import *
@CatchException
def 批量翻译PDF文档(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
disable_auto_promotion(chatbot)
# 基本信息:功能、贡献者
chatbot.append([None, "插件功能批量翻译PDF文档。函数插件贡献者: Binary-Husky"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# 尝试导入依赖,如果缺少依赖,则给出安装建议
try:
check_packages(["fitz", "tiktoken", "scipdf"])
except:
chatbot.append([None, f"导入软件依赖失败。使用该模块需要额外依赖,安装方法```pip install --upgrade pymupdf tiktoken scipdf_parser```。"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# 清空历史,以免输入溢出
history = []
success, file_manifest, project_folder = get_files_from_everything(txt, type='.pdf')
# 检测输入参数,如没有给定输入参数,直接退出
if (not success) and txt == "": txt = '空空如也的输入栏。提示请先上传文件把PDF文件拖入对话'
# 如果没找到任何文件
if len(file_manifest) == 0:
chatbot.append([None, f"找不到任何.pdf拓展名的文件: {txt}"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# 开始正式执行任务
method = plugin_kwargs.get("pdf_parse_method", None)
if method == "DOC2X":
# ------- 第一种方法,效果最好,但是需要DOC2X服务 -------
DOC2X_API_KEY = get_conf("DOC2X_API_KEY")
if len(DOC2X_API_KEY) != 0:
try:
yield from 解析PDF_基于DOC2X(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, DOC2X_API_KEY, user_request)
return
except:
chatbot.append([None, f"DOC2X服务不可用,现在将执行效果稍差的旧版代码。{trimmed_format_exc_markdown()}"])
yield from update_ui(chatbot=chatbot, history=history)
if method == "GROBID":
# ------- 第二种方法,效果次优 -------
grobid_url = get_avail_grobid_url()
if grobid_url is not None:
yield from 解析PDF_基于GROBID(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, grobid_url)
return
if method == "ClASSIC":
# ------- 第三种方法,早期代码,效果不理想 -------
yield from update_ui_lastest_msg("GROBID服务不可用,请检查config中的GROBID_URL。作为替代,现在将执行效果稍差的旧版代码。", chatbot, history, delay=3)
yield from 解析PDF_简单拆解(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt)
return
if method is None:
# ------- 以上三种方法都试一遍 -------
DOC2X_API_KEY = get_conf("DOC2X_API_KEY")
if len(DOC2X_API_KEY) != 0:
try:
yield from 解析PDF_基于DOC2X(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, DOC2X_API_KEY, user_request)
return
except:
chatbot.append([None, f"DOC2X服务不可用,正在尝试GROBID。{trimmed_format_exc_markdown()}"])
yield from update_ui(chatbot=chatbot, history=history)
grobid_url = get_avail_grobid_url()
if grobid_url is not None:
yield from 解析PDF_基于GROBID(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, grobid_url)
return
yield from update_ui_lastest_msg("GROBID服务不可用,请检查config中的GROBID_URL。作为替代,现在将执行效果稍差的旧版代码。", chatbot, history, delay=3)
yield from 解析PDF_简单拆解(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt)
return

查看文件

@@ -0,0 +1,33 @@
from crazy_functions.plugin_template.plugin_class_template import GptAcademicPluginTemplate, ArgProperty
from .PDF_Translate import 批量翻译PDF文档
class PDF_Tran(GptAcademicPluginTemplate):
def __init__(self):
"""
请注意`execute`会执行在不同的线程中,因此您在定义和使用类变量时,应当慎之又慎!
"""
pass
def define_arg_selection_menu(self):
"""
定义插件的二级选项菜单
"""
gui_definition = {
"main_input":
ArgProperty(title="PDF文件路径", description="未指定路径,请上传文件后,再点击该插件", default_value="", type="string").model_dump_json(), # 主输入,自动从输入框同步
"additional_prompt":
ArgProperty(title="额外提示词", description="例如:对专有名词、翻译语气等方面的要求", default_value="", type="string").model_dump_json(), # 高级参数输入区,自动同步
"pdf_parse_method":
ArgProperty(title="PDF解析方法", options=["DOC2X", "GROBID", "ClASSIC"], description="", default_value="GROBID", type="dropdown").model_dump_json(),
}
return gui_definition
def execute(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
"""
执行插件
"""
main_input = plugin_kwargs["main_input"]
additional_prompt = plugin_kwargs["additional_prompt"]
pdf_parse_method = plugin_kwargs["pdf_parse_method"]
yield from 批量翻译PDF文档(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request)

查看文件

@@ -0,0 +1,92 @@
from toolbox import CatchException, update_ui, get_conf, get_log_folder, update_ui_lastest_msg
from crazy_functions.crazy_utils import input_clipping
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
RAG_WORKER_REGISTER = {}
MAX_HISTORY_ROUND = 5
MAX_CONTEXT_TOKEN_LIMIT = 4096
REMEMBER_PREVIEW = 1000
@CatchException
def Rag问答(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
# import vector store lib
VECTOR_STORE_TYPE = "Milvus"
if VECTOR_STORE_TYPE == "Milvus":
try:
from crazy_functions.rag_fns.milvus_worker import MilvusRagWorker as LlamaIndexRagWorker
except:
VECTOR_STORE_TYPE = "Simple"
if VECTOR_STORE_TYPE == "Simple":
from crazy_functions.rag_fns.llama_index_worker import LlamaIndexRagWorker
# 1. we retrieve rag worker from global context
user_name = chatbot.get_user()
checkpoint_dir = get_log_folder(user_name, plugin_name='experimental_rag')
if user_name in RAG_WORKER_REGISTER:
rag_worker = RAG_WORKER_REGISTER[user_name]
else:
rag_worker = RAG_WORKER_REGISTER[user_name] = LlamaIndexRagWorker(
user_name,
llm_kwargs,
checkpoint_dir=checkpoint_dir,
auto_load_checkpoint=True)
current_context = f"{VECTOR_STORE_TYPE} @ {checkpoint_dir}"
tip = "提示输入“清空向量数据库”可以清空RAG向量数据库"
if txt == "清空向量数据库":
chatbot.append([txt, f'正在清空 ({current_context}) ...'])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
rag_worker.purge()
yield from update_ui_lastest_msg('已清空', chatbot, history, delay=0) # 刷新界面
return
chatbot.append([txt, f'正在召回知识 ({current_context}) ...'])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# 2. clip history to reduce token consumption
# 2-1. reduce chat round
txt_origin = txt
if len(history) > MAX_HISTORY_ROUND * 2:
history = history[-(MAX_HISTORY_ROUND * 2):]
txt_clip, history, flags = input_clipping(txt, history, max_token_limit=MAX_CONTEXT_TOKEN_LIMIT, return_clip_flags=True)
input_is_clipped_flag = (flags["original_input_len"] != flags["clipped_input_len"])
# 2-2. if input is clipped, add input to vector store before retrieve
if input_is_clipped_flag:
yield from update_ui_lastest_msg('检测到长输入, 正在向量化 ...', chatbot, history, delay=0) # 刷新界面
# save input to vector store
rag_worker.add_text_to_vector_store(txt_origin)
yield from update_ui_lastest_msg('向量化完成 ...', chatbot, history, delay=0) # 刷新界面
if len(txt_origin) > REMEMBER_PREVIEW:
HALF = REMEMBER_PREVIEW//2
i_say_to_remember = txt[:HALF] + f" ...\n...(省略{len(txt_origin)-REMEMBER_PREVIEW}字)...\n... " + txt[-HALF:]
if (flags["original_input_len"] - flags["clipped_input_len"]) > HALF:
txt_clip = txt_clip + f" ...\n...(省略{len(txt_origin)-len(txt_clip)-HALF}字)...\n... " + txt[-HALF:]
else:
pass
i_say = txt_clip
else:
i_say_to_remember = i_say = txt_clip
else:
i_say_to_remember = i_say = txt_clip
# 3. we search vector store and build prompts
nodes = rag_worker.retrieve_from_store_with_query(i_say)
prompt = rag_worker.build_prompt(query=i_say, nodes=nodes)
# 4. it is time to query llms
if len(chatbot) != 0: chatbot.pop(-1) # pop temp chat, because we are going to add them again inside `request_gpt_model_in_new_thread_with_ui_alive`
model_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
inputs=prompt, inputs_show_user=i_say,
llm_kwargs=llm_kwargs, chatbot=chatbot, history=history,
sys_prompt=system_prompt,
retry_times_at_unknown_error=0
)
# 5. remember what has been asked / answered
yield from update_ui_lastest_msg(model_say + '</br></br>' + f'对话记忆中, 请稍等 ({current_context}) ...', chatbot, history, delay=0.5) # 刷新界面
rag_worker.remember_qa(i_say_to_remember, model_say)
history.extend([i_say, model_say])
yield from update_ui_lastest_msg(model_say, chatbot, history, delay=0, msg=tip) # 刷新界面

查看文件

@@ -0,0 +1,167 @@
import pickle, os, random
from toolbox import CatchException, update_ui, get_conf, get_log_folder, update_ui_lastest_msg
from crazy_functions.crazy_utils import input_clipping
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from request_llms.bridge_all import predict_no_ui_long_connection
from crazy_functions.json_fns.select_tool import structure_output, select_tool
from pydantic import BaseModel, Field
from loguru import logger
from typing import List
SOCIAL_NETWOK_WORKER_REGISTER = {}
class SocialNetwork():
def __init__(self):
self.people = []
class SaveAndLoad():
def __init__(self, user_name, llm_kwargs, auto_load_checkpoint=True, checkpoint_dir=None) -> None:
self.user_name = user_name
self.checkpoint_dir = checkpoint_dir
if auto_load_checkpoint:
self.social_network = self.load_from_checkpoint(checkpoint_dir)
else:
self.social_network = SocialNetwork()
def does_checkpoint_exist(self, checkpoint_dir=None):
import os, glob
if checkpoint_dir is None: checkpoint_dir = self.checkpoint_dir
if not os.path.exists(checkpoint_dir): return False
if len(glob.glob(os.path.join(checkpoint_dir, "social_network.pkl"))) == 0: return False
return True
def save_to_checkpoint(self, checkpoint_dir=None):
if checkpoint_dir is None: checkpoint_dir = self.checkpoint_dir
with open(os.path.join(checkpoint_dir, 'social_network.pkl'), "wb+") as f:
pickle.dump(self.social_network, f)
return
def load_from_checkpoint(self, checkpoint_dir=None):
if checkpoint_dir is None: checkpoint_dir = self.checkpoint_dir
if self.does_checkpoint_exist(checkpoint_dir=checkpoint_dir):
with open(os.path.join(checkpoint_dir, 'social_network.pkl'), "rb") as f:
social_network = pickle.load(f)
return social_network
else:
return SocialNetwork()
class Friend(BaseModel):
friend_name: str = Field(description="name of a friend")
friend_description: str = Field(description="description of a friend (everything about this friend)")
friend_relationship: str = Field(description="The relationship with a friend (e.g. friend, family, colleague)")
class FriendList(BaseModel):
friends_list: List[Friend] = Field(description="The list of friends")
class SocialNetworkWorker(SaveAndLoad):
def ai_socail_advice(self, prompt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, run_gpt_fn, intention_type):
pass
def ai_remove_friend(self, prompt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, run_gpt_fn, intention_type):
pass
def ai_list_friends(self, prompt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, run_gpt_fn, intention_type):
pass
def ai_add_multi_friends(self, prompt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, run_gpt_fn, intention_type):
friend, err_msg = structure_output(
txt=prompt,
prompt="根据提示, 解析多个联系人的身份信息\n\n",
err_msg=f"不能理解该联系人",
run_gpt_fn=run_gpt_fn,
pydantic_cls=FriendList
)
if friend.friends_list:
for f in friend.friends_list:
self.add_friend(f)
msg = f"成功添加{len(friend.friends_list)}个联系人: {str(friend.friends_list)}"
yield from update_ui_lastest_msg(lastmsg=msg, chatbot=chatbot, history=history, delay=0)
def run(self, txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
prompt = txt
run_gpt_fn = lambda inputs, sys_prompt: predict_no_ui_long_connection(inputs=inputs, llm_kwargs=llm_kwargs, history=[], sys_prompt=sys_prompt, observe_window=[])
self.tools_to_select = {
"SocialAdvice":{
"explain_to_llm": "如果用户希望获取社交指导,调用SocialAdvice生成一些社交建议",
"callback": self.ai_socail_advice,
},
"AddFriends":{
"explain_to_llm": "如果用户给出了联系人,调用AddMultiFriends把联系人添加到数据库",
"callback": self.ai_add_multi_friends,
},
"RemoveFriend":{
"explain_to_llm": "如果用户希望移除某个联系人,调用RemoveFriend",
"callback": self.ai_remove_friend,
},
"ListFriends":{
"explain_to_llm": "如果用户列举联系人,调用ListFriends",
"callback": self.ai_list_friends,
}
}
try:
Explaination = '\n'.join([f'{k}: {v["explain_to_llm"]}' for k, v in self.tools_to_select.items()])
class UserSociaIntention(BaseModel):
intention_type: str = Field(
description=
f"The type of user intention. You must choose from {self.tools_to_select.keys()}.\n\n"
f"Explaination:\n{Explaination}",
default="SocialAdvice"
)
pydantic_cls_instance, err_msg = select_tool(
prompt=txt,
run_gpt_fn=run_gpt_fn,
pydantic_cls=UserSociaIntention
)
except Exception as e:
yield from update_ui_lastest_msg(
lastmsg=f"无法理解用户意图 {err_msg}",
chatbot=chatbot,
history=history,
delay=0
)
return
intention_type = pydantic_cls_instance.intention_type
intention_callback = self.tools_to_select[pydantic_cls_instance.intention_type]['callback']
yield from intention_callback(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, run_gpt_fn, intention_type)
def add_friend(self, friend):
# check whether the friend is already in the social network
for f in self.social_network.people:
if f.friend_name == friend.friend_name:
f.friend_description = friend.friend_description
f.friend_relationship = friend.friend_relationship
logger.info(f"Repeated friend, update info: {friend}")
return
logger.info(f"Add a new friend: {friend}")
self.social_network.people.append(friend)
return
@CatchException
def I人助手(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
# 1. we retrieve worker from global context
user_name = chatbot.get_user()
checkpoint_dir=get_log_folder(user_name, plugin_name='experimental_rag')
if user_name in SOCIAL_NETWOK_WORKER_REGISTER:
social_network_worker = SOCIAL_NETWOK_WORKER_REGISTER[user_name]
else:
social_network_worker = SOCIAL_NETWOK_WORKER_REGISTER[user_name] = SocialNetworkWorker(
user_name,
llm_kwargs,
checkpoint_dir=checkpoint_dir,
auto_load_checkpoint=True
)
# 2. save
yield from social_network_worker.run(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request)
social_network_worker.save_to_checkpoint(checkpoint_dir)
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面

查看文件

@@ -1,12 +1,12 @@
from toolbox import update_ui, promote_file_to_downloadzone, disable_auto_promotion
from toolbox import update_ui, promote_file_to_downloadzone
from toolbox import CatchException, report_exception, write_history_to_file
from .crazy_utils import input_clipping
from shared_utils.fastapi_server import validate_path_safety
from crazy_functions.crazy_utils import input_clipping
def 解析源代码新(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
import os, copy
from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
disable_auto_promotion(chatbot=chatbot)
from crazy_functions.crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
summary_batch_isolation = True
inputs_array = []
@@ -23,7 +23,7 @@ def 解析源代码新(file_manifest, project_folder, llm_kwargs, plugin_kwargs,
file_content = f.read()
prefix = "接下来请你逐文件分析下面的工程" if index==0 else ""
i_say = prefix + f'请对下面的程序文件做一个概述文件名是{os.path.relpath(fp, project_folder)},文件代码是 ```{file_content}```'
i_say_show_user = prefix + f'[{index}/{len(file_manifest)}] 请对下面的程序文件做一个概述: {fp}'
i_say_show_user = prefix + f'[{index+1}/{len(file_manifest)}] 请对下面的程序文件做一个概述: {fp}'
# 装载请求内容
inputs_array.append(i_say)
inputs_show_user_array.append(i_say_show_user)
@@ -128,6 +128,7 @@ def 解析一个Python项目(txt, llm_kwargs, plugin_kwargs, chatbot, history, s
import glob, os
if os.path.exists(txt):
project_folder = txt
validate_path_safety(project_folder, chatbot.get_user())
else:
if txt == "": txt = '空空如也的输入栏'
report_exception(chatbot, history, a = f"解析项目: {txt}", b = f"找不到本地项目或无权访问: {txt}")
@@ -146,6 +147,7 @@ def 解析一个Matlab项目(txt, llm_kwargs, plugin_kwargs, chatbot, history, s
import glob, os
if os.path.exists(txt):
project_folder = txt
validate_path_safety(project_folder, chatbot.get_user())
else:
if txt == "": txt = '空空如也的输入栏'
report_exception(chatbot, history, a = f"解析Matlab项目: {txt}", b = f"找不到本地项目或无权访问: {txt}")
@@ -164,6 +166,7 @@ def 解析一个C项目的头文件(txt, llm_kwargs, plugin_kwargs, chatbot, his
import glob, os
if os.path.exists(txt):
project_folder = txt
validate_path_safety(project_folder, chatbot.get_user())
else:
if txt == "": txt = '空空如也的输入栏'
report_exception(chatbot, history, a = f"解析项目: {txt}", b = f"找不到本地项目或无权访问: {txt}")
@@ -184,6 +187,7 @@ def 解析一个C项目(txt, llm_kwargs, plugin_kwargs, chatbot, history, system
import glob, os
if os.path.exists(txt):
project_folder = txt
validate_path_safety(project_folder, chatbot.get_user())
else:
if txt == "": txt = '空空如也的输入栏'
report_exception(chatbot, history, a = f"解析项目: {txt}", b = f"找不到本地项目或无权访问: {txt}")
@@ -206,6 +210,7 @@ def 解析一个Java项目(txt, llm_kwargs, plugin_kwargs, chatbot, history, sys
import glob, os
if os.path.exists(txt):
project_folder = txt
validate_path_safety(project_folder, chatbot.get_user())
else:
if txt == "": txt = '空空如也的输入栏'
report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"找不到本地项目或无权访问: {txt}")
@@ -228,6 +233,7 @@ def 解析一个前端项目(txt, llm_kwargs, plugin_kwargs, chatbot, history, s
import glob, os
if os.path.exists(txt):
project_folder = txt
validate_path_safety(project_folder, chatbot.get_user())
else:
if txt == "": txt = '空空如也的输入栏'
report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"找不到本地项目或无权访问: {txt}")
@@ -257,6 +263,7 @@ def 解析一个Golang项目(txt, llm_kwargs, plugin_kwargs, chatbot, history, s
import glob, os
if os.path.exists(txt):
project_folder = txt
validate_path_safety(project_folder, chatbot.get_user())
else:
if txt == "": txt = '空空如也的输入栏'
report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"找不到本地项目或无权访问: {txt}")
@@ -278,6 +285,7 @@ def 解析一个Rust项目(txt, llm_kwargs, plugin_kwargs, chatbot, history, sys
import glob, os
if os.path.exists(txt):
project_folder = txt
validate_path_safety(project_folder, chatbot.get_user())
else:
if txt == "": txt = '空空如也的输入栏'
report_exception(chatbot, history, a=f"解析项目: {txt}", b=f"找不到本地项目或无权访问: {txt}")
@@ -298,6 +306,7 @@ def 解析一个Lua项目(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst
import glob, os
if os.path.exists(txt):
project_folder = txt
validate_path_safety(project_folder, chatbot.get_user())
else:
if txt == "": txt = '空空如也的输入栏'
report_exception(chatbot, history, a = f"解析项目: {txt}", b = f"找不到本地项目或无权访问: {txt}")
@@ -320,6 +329,7 @@ def 解析一个CSharp项目(txt, llm_kwargs, plugin_kwargs, chatbot, history, s
import glob, os
if os.path.exists(txt):
project_folder = txt
validate_path_safety(project_folder, chatbot.get_user())
else:
if txt == "": txt = '空空如也的输入栏'
report_exception(chatbot, history, a = f"解析项目: {txt}", b = f"找不到本地项目或无权访问: {txt}")
@@ -345,15 +355,19 @@ def 解析任意code项目(txt, llm_kwargs, plugin_kwargs, chatbot, history, sys
pattern_except_suffix = [_.lstrip(" ^*.,").rstrip(" ,") for _ in txt_pattern.split(" ") if _ != "" and _.strip().startswith("^*.")]
pattern_except_suffix += ['zip', 'rar', '7z', 'tar', 'gz'] # 避免解析压缩文件
# 将要忽略匹配的文件名(例如: ^README.md)
pattern_except_name = [_.lstrip(" ^*,").rstrip(" ,").replace(".", "\.") for _ in txt_pattern.split(" ") if _ != "" and _.strip().startswith("^") and not _.strip().startswith("^*.")]
pattern_except_name = [_.lstrip(" ^*,").rstrip(" ,").replace(".", r"\.") # 移除左边通配符,移除右侧逗号,转义点号
for _ in txt_pattern.split(" ") # 以空格分割
if (_ != "" and _.strip().startswith("^") and not _.strip().startswith("^*.")) # ^开始,但不是^*.开始
]
# 生成正则表达式
pattern_except = '/[^/]+\.(' + "|".join(pattern_except_suffix) + ')$'
pattern_except = r'/[^/]+\.(' + "|".join(pattern_except_suffix) + ')$'
pattern_except += '|/(' + "|".join(pattern_except_name) + ')$' if pattern_except_name != [] else ''
history.clear()
import glob, os, re
if os.path.exists(txt):
project_folder = txt
validate_path_safety(project_folder, chatbot.get_user())
else:
if txt == "": txt = '空空如也的输入栏'
report_exception(chatbot, history, a = f"解析项目: {txt}", b = f"找不到本地项目或无权访问: {txt}")

查看文件

@@ -0,0 +1,162 @@
import os, copy, time
from toolbox import CatchException, report_exception, update_ui, zip_result, promote_file_to_downloadzone, update_ui_lastest_msg, get_conf, generate_file_link
from shared_utils.fastapi_server import validate_path_safety
from crazy_functions.crazy_utils import input_clipping
from crazy_functions.crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from crazy_functions.agent_fns.python_comment_agent import PythonCodeComment
from crazy_functions.diagram_fns.file_tree import FileNode
from crazy_functions.agent_fns.watchdog import WatchDog
from shared_utils.advanced_markdown_format import markdown_convertion_for_file
from loguru import logger
def 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
summary_batch_isolation = True
inputs_array = []
inputs_show_user_array = []
history_array = []
sys_prompt_array = []
assert len(file_manifest) <= 512, "源文件太多超过512个, 请缩减输入文件的数量。或者,您也可以选择删除此行警告,并修改代码拆分file_manifest列表,从而实现分批次处理。"
# 建立文件树
file_tree_struct = FileNode("root", build_manifest=True)
for file_path in file_manifest:
file_tree_struct.add_file(file_path, file_path)
# <第一步,逐个文件分析,多线程>
lang = "" if not plugin_kwargs["use_chinese"] else " (you must use Chinese)"
for index, fp in enumerate(file_manifest):
# 读取文件
with open(fp, 'r', encoding='utf-8', errors='replace') as f:
file_content = f.read()
prefix = ""
i_say = prefix + f'Please conclude the following source code at {os.path.relpath(fp, project_folder)} with only one sentence{lang}, the code is:\n```{file_content}```'
i_say_show_user = prefix + f'[{index+1}/{len(file_manifest)}] 请用一句话对下面的程序文件做一个整体概述: {fp}'
# 装载请求内容
MAX_TOKEN_SINGLE_FILE = 2560
i_say, _ = input_clipping(inputs=i_say, history=[], max_token_limit=MAX_TOKEN_SINGLE_FILE)
inputs_array.append(i_say)
inputs_show_user_array.append(i_say_show_user)
history_array.append([])
sys_prompt_array.append(f"You are a software architecture analyst analyzing a source code project. Do not dig into details, tell me what the code is doing in general. Your answer must be short, simple and clear{lang}.")
# 文件读取完成,对每一个源代码文件,生成一个请求线程,发送到大模型进行分析
gpt_response_collection = yield from request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
inputs_array = inputs_array,
inputs_show_user_array = inputs_show_user_array,
history_array = history_array,
sys_prompt_array = sys_prompt_array,
llm_kwargs = llm_kwargs,
chatbot = chatbot,
show_user_at_complete = True
)
# <第二步,逐个文件分析,生成带注释文件>
tasks = ["" for _ in range(len(file_manifest))]
def bark_fn(tasks):
for i in range(len(tasks)): tasks[i] = "watchdog is dead"
wd = WatchDog(timeout=10, bark_fn=lambda: bark_fn(tasks), interval=3, msg="ThreadWatcher timeout")
wd.begin_watch()
from concurrent.futures import ThreadPoolExecutor
executor = ThreadPoolExecutor(max_workers=get_conf('DEFAULT_WORKER_NUM'))
def _task_multi_threading(i_say, gpt_say, fp, file_tree_struct, index):
language = 'Chinese' if plugin_kwargs["use_chinese"] else 'English'
def observe_window_update(x):
if tasks[index] == "watchdog is dead":
raise TimeoutError("ThreadWatcher: watchdog is dead")
tasks[index] = x
pcc = PythonCodeComment(llm_kwargs, plugin_kwargs, language=language, observe_window_update=observe_window_update)
pcc.read_file(path=fp, brief=gpt_say)
revised_path, revised_content = pcc.begin_comment_source_code(None, None)
file_tree_struct.manifest[fp].revised_path = revised_path
file_tree_struct.manifest[fp].revised_content = revised_content
# <将结果写回源文件>
with open(fp, 'w', encoding='utf-8') as f:
f.write(file_tree_struct.manifest[fp].revised_content)
# <生成对比html>
with open("crazy_functions/agent_fns/python_comment_compare.html", 'r', encoding='utf-8') as f:
html_template = f.read()
warp = lambda x: "```python\n\n" + x + "\n\n```"
from themes.theme import load_dynamic_theme
_, advanced_css, _, _ = load_dynamic_theme("Default")
html_template = html_template.replace("ADVANCED_CSS", advanced_css)
html_template = html_template.replace("REPLACE_CODE_FILE_LEFT", pcc.get_markdown_block_in_html(markdown_convertion_for_file(warp(pcc.original_content))))
html_template = html_template.replace("REPLACE_CODE_FILE_RIGHT", pcc.get_markdown_block_in_html(markdown_convertion_for_file(warp(revised_content))))
compare_html_path = fp + '.compare.html'
file_tree_struct.manifest[fp].compare_html = compare_html_path
with open(compare_html_path, 'w', encoding='utf-8') as f:
f.write(html_template)
tasks[index] = ""
chatbot.append([None, f"正在处理:"])
futures = []
index = 0
for i_say, gpt_say, fp in zip(gpt_response_collection[0::2], gpt_response_collection[1::2], file_manifest):
future = executor.submit(_task_multi_threading, i_say, gpt_say, fp, file_tree_struct, index)
index += 1
futures.append(future)
# <第三步,等待任务完成>
cnt = 0
while True:
cnt += 1
wd.feed()
time.sleep(3)
worker_done = [h.done() for h in futures]
remain = len(worker_done) - sum(worker_done)
# <展示已经完成的部分>
preview_html_list = []
for done, fp in zip(worker_done, file_manifest):
if not done: continue
if hasattr(file_tree_struct.manifest[fp], 'compare_html'):
preview_html_list.append(file_tree_struct.manifest[fp].compare_html)
else:
logger.error(f"文件: {fp} 的注释结果未能成功")
file_links = generate_file_link(preview_html_list)
yield from update_ui_lastest_msg(
f"当前任务: <br/>{'<br/>'.join(tasks)}.<br/>" +
f"剩余源文件数量: {remain}.<br/>" +
f"已完成的文件: {sum(worker_done)}.<br/>" +
file_links +
"<br/>" +
''.join(['.']*(cnt % 10 + 1)
), chatbot=chatbot, history=history, delay=0)
yield from update_ui(chatbot=chatbot, history=[]) # 刷新界面
if all(worker_done):
executor.shutdown()
break
# <第四步,压缩结果>
zip_res = zip_result(project_folder)
promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
# <END>
chatbot.append((None, "所有源文件均已处理完毕。"))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
@CatchException
def 注释Python项目(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
history = [] # 清空历史,以免输入溢出
plugin_kwargs["use_chinese"] = plugin_kwargs.get("use_chinese", False)
import glob, os
if os.path.exists(txt):
project_folder = txt
validate_path_safety(project_folder, chatbot.get_user())
else:
if txt == "": txt = '空空如也的输入栏'
report_exception(chatbot, history, a = f"解析项目: {txt}", b = f"找不到本地项目或无权访问: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
file_manifest = [f for f in glob.glob(f'{project_folder}/**/*.py', recursive=True)]
if len(file_manifest) == 0:
report_exception(chatbot, history, a = f"解析项目: {txt}", b = f"找不到任何python文件: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
yield from 注释源代码(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt)

查看文件

@@ -0,0 +1,36 @@
from toolbox import get_conf, update_ui
from crazy_functions.plugin_template.plugin_class_template import GptAcademicPluginTemplate, ArgProperty
from crazy_functions.SourceCode_Comment import 注释Python项目
class SourceCodeComment_Wrap(GptAcademicPluginTemplate):
def __init__(self):
"""
请注意`execute`会执行在不同的线程中,因此您在定义和使用类变量时,应当慎之又慎!
"""
pass
def define_arg_selection_menu(self):
"""
定义插件的二级选项菜单
"""
gui_definition = {
"main_input":
ArgProperty(title="路径", description="程序路径(上传文件后自动填写)", default_value="", type="string").model_dump_json(), # 主输入,自动从输入框同步
"use_chinese":
ArgProperty(title="注释语言", options=["英文", "中文"], default_value="英文", description="", type="dropdown").model_dump_json(),
# "use_emoji":
# ArgProperty(title="在注释中使用emoji", options=["禁止", "允许"], default_value="禁止", description="无", type="dropdown").model_dump_json(),
}
return gui_definition
def execute(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
"""
执行插件
"""
if plugin_kwargs["use_chinese"] == "中文":
plugin_kwargs["use_chinese"] = True
else:
plugin_kwargs["use_chinese"] = False
yield from 注释Python项目(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request)

查看文件

@@ -1,4 +1,5 @@
from crazy_functions.agent_fns.pipe import PluginMultiprocessManager, PipeCom
from loguru import logger
class EchoDemo(PluginMultiprocessManager):
def subprocess_worker(self, child_conn):
@@ -16,4 +17,4 @@ class EchoDemo(PluginMultiprocessManager):
elif msg.cmd == "terminate":
self.child_conn.send(PipeCom("done", ""))
break
print('[debug] subprocess_worker terminated')
logger.info('[debug] subprocess_worker terminated')

查看文件

@@ -1,5 +1,6 @@
from toolbox import get_log_folder, update_ui, gen_time_str, get_conf, promote_file_to_downloadzone
from crazy_functions.agent_fns.watchdog import WatchDog
from loguru import logger
import time, os
class PipeCom:
@@ -47,7 +48,7 @@ class PluginMultiprocessManager:
def terminate(self):
self.p.terminate()
self.alive = False
print("[debug] instance terminated")
logger.info("[debug] instance terminated")
def subprocess_worker(self, child_conn):
# ⭐⭐ run in subprocess

查看文件

@@ -0,0 +1,457 @@
import datetime
import re
import os
from loguru import logger
from textwrap import dedent
from toolbox import CatchException, update_ui
from request_llms.bridge_all import predict_no_ui_long_connection
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
# TODO: 解决缩进问题
find_function_end_prompt = '''
Below is a page of code that you need to read. This page may not yet complete, you job is to split this page to sperate functions, class functions etc.
- Provide the line number where the first visible function ends.
- Provide the line number where the next visible function begins.
- If there are no other functions in this page, you should simply return the line number of the last line.
- Only focus on functions declared by `def` keyword. Ignore inline functions. Ignore function calls.
------------------ Example ------------------
INPUT:
```
L0000 |import sys
L0001 |import re
L0002 |
L0003 |def trimmed_format_exc():
L0004 | import os
L0005 | import traceback
L0006 | str = traceback.format_exc()
L0007 | current_path = os.getcwd()
L0008 | replace_path = "."
L0009 | return str.replace(current_path, replace_path)
L0010 |
L0011 |
L0012 |def trimmed_format_exc_markdown():
L0013 | ...
L0014 | ...
```
OUTPUT:
```
<first_function_end_at>L0009</first_function_end_at>
<next_function_begin_from>L0012</next_function_begin_from>
```
------------------ End of Example ------------------
------------------ the real INPUT you need to process NOW ------------------
```
{THE_TAGGED_CODE}
```
'''
revise_funtion_prompt = '''
You need to read the following code, and revise the source code ({FILE_BASENAME}) according to following instructions:
1. You should analyze the purpose of the functions (if there are any).
2. You need to add docstring for the provided functions (if there are any).
Be aware:
1. You must NOT modify the indent of code.
2. You are NOT authorized to change or translate non-comment code, and you are NOT authorized to add empty lines either, toggle qu.
3. Use {LANG} to add comments and docstrings. Do NOT translate Chinese that is already in the code.
4. Besides adding a docstring, use the ⭐ symbol to annotate the most core and important line of code within the function, explaining its role.
------------------ Example ------------------
INPUT:
```
L0000 |
L0001 |def zip_result(folder):
L0002 | t = gen_time_str()
L0003 | zip_folder(folder, get_log_folder(), f"result.zip")
L0004 | return os.path.join(get_log_folder(), f"result.zip")
L0005 |
L0006 |
```
OUTPUT:
<instruction_1_purpose>
This function compresses a given folder, and return the path of the resulting `zip` file.
</instruction_1_purpose>
<instruction_2_revised_code>
```
def zip_result(folder):
"""
Compresses the specified folder into a zip file and stores it in the log folder.
Args:
folder (str): The path to the folder that needs to be compressed.
Returns:
str: The path to the created zip file in the log folder.
"""
t = gen_time_str()
zip_folder(folder, get_log_folder(), f"result.zip") # ⭐ Execute the zipping of folder
return os.path.join(get_log_folder(), f"result.zip")
```
</instruction_2_revised_code>
------------------ End of Example ------------------
------------------ the real INPUT you need to process NOW ({FILE_BASENAME}) ------------------
```
{THE_CODE}
```
{INDENT_REMINDER}
{BRIEF_REMINDER}
{HINT_REMINDER}
'''
revise_funtion_prompt_chinese = '''
您需要阅读以下代码,并根据以下说明修订源代码({FILE_BASENAME}):
1. 如果源代码中包含函数的话, 你应该分析给定函数实现了什么功能
2. 如果源代码中包含函数的话, 你需要为函数添加docstring, docstring必须使用中文
请注意:
1. 你不得修改代码的缩进
2. 你无权更改或翻译代码中的非注释部分,也不允许添加空行
3. 使用 {LANG} 添加注释和文档字符串。不要翻译代码中已有的中文
4. 除了添加docstring之外, 使用⭐符号给该函数中最核心、最重要的一行代码添加注释,并说明其作用
------------------ 示例 ------------------
INPUT:
```
L0000 |
L0001 |def zip_result(folder):
L0002 | t = gen_time_str()
L0003 | zip_folder(folder, get_log_folder(), f"result.zip")
L0004 | return os.path.join(get_log_folder(), f"result.zip")
L0005 |
L0006 |
```
OUTPUT:
<instruction_1_purpose>
该函数用于压缩指定文件夹,并返回生成的`zip`文件的路径。
</instruction_1_purpose>
<instruction_2_revised_code>
```
def zip_result(folder):
"""
该函数将指定的文件夹压缩成ZIP文件, 并将其存储在日志文件夹中。
输入参数:
folder (str): 需要压缩的文件夹的路径。
返回值:
str: 日志文件夹中创建的ZIP文件的路径。
"""
t = gen_time_str()
zip_folder(folder, get_log_folder(), f"result.zip") # ⭐ 执行文件夹的压缩
return os.path.join(get_log_folder(), f"result.zip")
```
</instruction_2_revised_code>
------------------ End of Example ------------------
------------------ the real INPUT you need to process NOW ({FILE_BASENAME}) ------------------
```
{THE_CODE}
```
{INDENT_REMINDER}
{BRIEF_REMINDER}
{HINT_REMINDER}
'''
class PythonCodeComment():
def __init__(self, llm_kwargs, plugin_kwargs, language, observe_window_update) -> None:
self.original_content = ""
self.full_context = []
self.full_context_with_line_no = []
self.current_page_start = 0
self.page_limit = 100 # 100 lines of code each page
self.ignore_limit = 20
self.llm_kwargs = llm_kwargs
self.plugin_kwargs = plugin_kwargs
self.language = language
self.observe_window_update = observe_window_update
if self.language == "chinese":
self.core_prompt = revise_funtion_prompt_chinese
else:
self.core_prompt = revise_funtion_prompt
self.path = None
self.file_basename = None
self.file_brief = ""
def generate_tagged_code_from_full_context(self):
for i, code in enumerate(self.full_context):
number = i
padded_number = f"{number:04}"
result = f"L{padded_number}"
self.full_context_with_line_no.append(f"{result} | {code}")
return self.full_context_with_line_no
def read_file(self, path, brief):
with open(path, 'r', encoding='utf8') as f:
self.full_context = f.readlines()
self.original_content = ''.join(self.full_context)
self.file_basename = os.path.basename(path)
self.file_brief = brief
self.full_context_with_line_no = self.generate_tagged_code_from_full_context()
self.path = path
def find_next_function_begin(self, tagged_code:list, begin_and_end):
begin, end = begin_and_end
THE_TAGGED_CODE = ''.join(tagged_code)
self.llm_kwargs['temperature'] = 0
result = predict_no_ui_long_connection(
inputs=find_function_end_prompt.format(THE_TAGGED_CODE=THE_TAGGED_CODE),
llm_kwargs=self.llm_kwargs,
history=[],
sys_prompt="",
observe_window=[],
console_slience=True
)
def extract_number(text):
# 使用正则表达式匹配模式
match = re.search(r'<next_function_begin_from>L(\d+)</next_function_begin_from>', text)
if match:
# 提取匹配的数字部分并转换为整数
return int(match.group(1))
return None
line_no = extract_number(result)
if line_no is not None:
return line_no
else:
return end
def _get_next_window(self):
#
current_page_start = self.current_page_start
if self.current_page_start == len(self.full_context) + 1:
raise StopIteration
# 如果剩余的行数非常少,一鼓作气处理掉
if len(self.full_context) - self.current_page_start < self.ignore_limit:
future_page_start = len(self.full_context) + 1
self.current_page_start = future_page_start
return current_page_start, future_page_start
tagged_code = self.full_context_with_line_no[ self.current_page_start: self.current_page_start + self.page_limit]
line_no = self.find_next_function_begin(tagged_code, [self.current_page_start, self.current_page_start + self.page_limit])
if line_no > len(self.full_context) - 5:
line_no = len(self.full_context) + 1
future_page_start = line_no
self.current_page_start = future_page_start
# ! consider eof
return current_page_start, future_page_start
def dedent(self, text):
"""Remove any common leading whitespace from every line in `text`.
"""
# Look for the longest leading string of spaces and tabs common to
# all lines.
margin = None
_whitespace_only_re = re.compile('^[ \t]+$', re.MULTILINE)
_leading_whitespace_re = re.compile('(^[ \t]*)(?:[^ \t\n])', re.MULTILINE)
text = _whitespace_only_re.sub('', text)
indents = _leading_whitespace_re.findall(text)
for indent in indents:
if margin is None:
margin = indent
# Current line more deeply indented than previous winner:
# no change (previous winner is still on top).
elif indent.startswith(margin):
pass
# Current line consistent with and no deeper than previous winner:
# it's the new winner.
elif margin.startswith(indent):
margin = indent
# Find the largest common whitespace between current line and previous
# winner.
else:
for i, (x, y) in enumerate(zip(margin, indent)):
if x != y:
margin = margin[:i]
break
# sanity check (testing/debugging only)
if 0 and margin:
for line in text.split("\n"):
assert not line or line.startswith(margin), \
"line = %r, margin = %r" % (line, margin)
if margin:
text = re.sub(r'(?m)^' + margin, '', text)
return text, len(margin)
else:
return text, 0
def get_next_batch(self):
current_page_start, future_page_start = self._get_next_window()
return ''.join(self.full_context[current_page_start: future_page_start]), current_page_start, future_page_start
def tag_code(self, fn, hint):
code = fn
_, n_indent = self.dedent(code)
indent_reminder = "" if n_indent == 0 else "(Reminder: as you can see, this piece of code has indent made up with {n_indent} whitespace, please preseve them in the OUTPUT.)"
brief_reminder = "" if self.file_brief == "" else f"({self.file_basename} abstract: {self.file_brief})"
hint_reminder = "" if hint is None else f"(Reminder: do not ignore or modify code such as `{hint}`, provide complete code in the OUTPUT.)"
self.llm_kwargs['temperature'] = 0
result = predict_no_ui_long_connection(
inputs=self.core_prompt.format(
LANG=self.language,
FILE_BASENAME=self.file_basename,
THE_CODE=code,
INDENT_REMINDER=indent_reminder,
BRIEF_REMINDER=brief_reminder,
HINT_REMINDER=hint_reminder
),
llm_kwargs=self.llm_kwargs,
history=[],
sys_prompt="",
observe_window=[],
console_slience=True
)
def get_code_block(reply):
import re
pattern = r"```([\s\S]*?)```" # regex pattern to match code blocks
matches = re.findall(pattern, reply) # find all code blocks in text
if len(matches) == 1:
return matches[0].strip('python') # code block
return None
code_block = get_code_block(result)
if code_block is not None:
code_block = self.sync_and_patch(original=code, revised=code_block)
return code_block
else:
return code
def get_markdown_block_in_html(self, html):
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
found_list = soup.find_all("div", class_="markdown-body")
if found_list:
res = found_list[0]
return res.prettify()
else:
return None
def sync_and_patch(self, original, revised):
"""Ensure the number of pre-string empty lines in revised matches those in original."""
def count_leading_empty_lines(s, reverse=False):
"""Count the number of leading empty lines in a string."""
lines = s.split('\n')
if reverse: lines = list(reversed(lines))
count = 0
for line in lines:
if line.strip() == '':
count += 1
else:
break
return count
original_empty_lines = count_leading_empty_lines(original)
revised_empty_lines = count_leading_empty_lines(revised)
if original_empty_lines > revised_empty_lines:
additional_lines = '\n' * (original_empty_lines - revised_empty_lines)
revised = additional_lines + revised
elif original_empty_lines < revised_empty_lines:
lines = revised.split('\n')
revised = '\n'.join(lines[revised_empty_lines - original_empty_lines:])
original_empty_lines = count_leading_empty_lines(original, reverse=True)
revised_empty_lines = count_leading_empty_lines(revised, reverse=True)
if original_empty_lines > revised_empty_lines:
additional_lines = '\n' * (original_empty_lines - revised_empty_lines)
revised = revised + additional_lines
elif original_empty_lines < revised_empty_lines:
lines = revised.split('\n')
revised = '\n'.join(lines[:-(revised_empty_lines - original_empty_lines)])
return revised
def begin_comment_source_code(self, chatbot=None, history=None):
# from toolbox import update_ui_lastest_msg
assert self.path is not None
assert '.py' in self.path # must be python source code
# write_target = self.path + '.revised.py'
write_content = ""
# with open(self.path + '.revised.py', 'w+', encoding='utf8') as f:
while True:
try:
# yield from update_ui_lastest_msg(f"({self.file_basename}) 正在读取下一段代码片段:\n", chatbot=chatbot, history=history, delay=0)
next_batch, line_no_start, line_no_end = self.get_next_batch()
self.observe_window_update(f"正在处理{self.file_basename} - {line_no_start}/{len(self.full_context)}\n")
# yield from update_ui_lastest_msg(f"({self.file_basename}) 处理代码片段:\n\n{next_batch}", chatbot=chatbot, history=history, delay=0)
hint = None
MAX_ATTEMPT = 2
for attempt in range(MAX_ATTEMPT):
result = self.tag_code(next_batch, hint)
try:
successful, hint = self.verify_successful(next_batch, result)
except Exception as e:
logger.error('ignored exception:\n' + str(e))
break
if successful:
break
if attempt == MAX_ATTEMPT - 1:
# cannot deal with this, give up
result = next_batch
break
# f.write(result)
write_content += result
except StopIteration:
next_batch, line_no_start, line_no_end = [], -1, -1
return None, write_content
def verify_successful(self, original, revised):
""" Determine whether the revised code contains every line that already exists
"""
from crazy_functions.ast_fns.comment_remove import remove_python_comments
original = remove_python_comments(original)
original_lines = original.split('\n')
revised_lines = revised.split('\n')
for l in original_lines:
l = l.strip()
if '\'' in l or '\"' in l: continue # ast sometimes toggle " to '
found = False
for lt in revised_lines:
if l in lt:
found = True
break
if not found:
return False, l
return True, None

查看文件

@@ -0,0 +1,45 @@
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<style>ADVANCED_CSS</style>
<meta charset="UTF-8">
<title>源文件对比</title>
<style>
body {
font-family: Arial, sans-serif;
display: flex;
justify-content: center;
align-items: center;
height: 100vh;
margin: 0;
}
.container {
display: flex;
width: 95%;
height: -webkit-fill-available;
}
.code-container {
flex: 1;
margin: 0px;
padding: 0px;
border: 1px solid #ccc;
background-color: #f9f9f9;
overflow: auto;
}
pre {
white-space: pre-wrap;
word-wrap: break-word;
}
</style>
</head>
<body>
<div class="container">
<div class="code-container">
REPLACE_CODE_FILE_LEFT
</div>
<div class="code-container">
REPLACE_CODE_FILE_RIGHT
</div>
</div>
</body>
</html>

查看文件

@@ -1,4 +1,5 @@
import threading, time
from loguru import logger
class WatchDog():
def __init__(self, timeout, bark_fn, interval=3, msg="") -> None:
@@ -13,7 +14,7 @@ class WatchDog():
while True:
if self.kill_dog: break
if time.time() - self.last_feed > self.timeout:
if len(self.msg) > 0: print(self.msg)
if len(self.msg) > 0: logger.info(self.msg)
self.bark_fn()
break
time.sleep(self.interval)

查看文件

@@ -0,0 +1,54 @@
import token
import tokenize
import copy
import io
def remove_python_comments(input_source: str) -> str:
source_flag = copy.copy(input_source)
source = io.StringIO(input_source)
ls = input_source.split('\n')
prev_toktype = token.INDENT
readline = source.readline
def get_char_index(lineno, col):
# find the index of the char in the source code
if lineno == 1:
return len('\n'.join(ls[:(lineno-1)])) + col
else:
return len('\n'.join(ls[:(lineno-1)])) + col + 1
def replace_char_between(start_lineno, start_col, end_lineno, end_col, source, replace_char, ls):
# replace char between start_lineno, start_col and end_lineno, end_col with replace_char, but keep '\n' and ' '
b = get_char_index(start_lineno, start_col)
e = get_char_index(end_lineno, end_col)
for i in range(b, e):
if source[i] == '\n':
source = source[:i] + '\n' + source[i+1:]
elif source[i] == ' ':
source = source[:i] + ' ' + source[i+1:]
else:
source = source[:i] + replace_char + source[i+1:]
return source
tokgen = tokenize.generate_tokens(readline)
for toktype, ttext, (slineno, scol), (elineno, ecol), ltext in tokgen:
if toktype == token.STRING and (prev_toktype == token.INDENT):
source_flag = replace_char_between(slineno, scol, elineno, ecol, source_flag, ' ', ls)
elif toktype == token.STRING and (prev_toktype == token.NEWLINE):
source_flag = replace_char_between(slineno, scol, elineno, ecol, source_flag, ' ', ls)
elif toktype == tokenize.COMMENT:
source_flag = replace_char_between(slineno, scol, elineno, ecol, source_flag, ' ', ls)
prev_toktype = toktype
return source_flag
# 示例使用
if __name__ == "__main__":
with open("source.py", "r", encoding="utf-8") as f:
source_code = f.read()
cleaned_code = remove_python_comments(source_code)
with open("cleaned_source.py", "w", encoding="utf-8") as f:
f.write(cleaned_code)

查看文件

@@ -1,5 +1,5 @@
from toolbox import CatchException, update_ui, promote_file_to_downloadzone
from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
from crazy_functions.crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
import datetime, json
def fetch_items(list_of_items, batch_size):

查看文件

@@ -1,25 +1,39 @@
from toolbox import update_ui, get_conf, trimmed_format_exc, get_max_token, Singleton
import threading
import os
import logging
import threading
from loguru import logger
from shared_utils.char_visual_effect import scolling_visual_effect
from toolbox import update_ui, get_conf, trimmed_format_exc, get_max_token, Singleton
def input_clipping(inputs, history, max_token_limit):
def input_clipping(inputs, history, max_token_limit, return_clip_flags=False):
"""
当输入文本 + 历史文本超出最大限制时,采取措施丢弃一部分文本。
输入:
- inputs 本次请求
- history 历史上下文
- max_token_limit 最大token限制
输出:
- inputs 本次请求经过clip
- history 历史上下文经过clip
"""
import numpy as np
from request_llms.bridge_all import model_info
enc = model_info["gpt-3.5-turbo"]['tokenizer']
def get_token_num(txt): return len(enc.encode(txt, disallowed_special=()))
mode = 'input-and-history'
# 当 输入部分的token占比 小于 全文的一半时,只裁剪历史
input_token_num = get_token_num(inputs)
original_input_len = len(inputs)
if input_token_num < max_token_limit//2:
mode = 'only-history'
max_token_limit = max_token_limit - input_token_num
everything = [inputs] if mode == 'input-and-history' else ['']
everything.extend(history)
n_token = get_token_num('\n'.join(everything))
full_token_num = n_token = get_token_num('\n'.join(everything))
everything_token = [get_token_num(e) for e in everything]
everything_token_num = sum(everything_token)
delta = max(everything_token) // 16 # 截断时的颗粒度
while n_token > max_token_limit:
@@ -32,10 +46,24 @@ def input_clipping(inputs, history, max_token_limit):
if mode == 'input-and-history':
inputs = everything[0]
full_token_num = everything_token_num
else:
pass
full_token_num = everything_token_num + input_token_num
history = everything[1:]
flags = {
"mode": mode,
"original_input_token_num": input_token_num,
"original_full_token_num": full_token_num,
"original_input_len": original_input_len,
"clipped_input_len": len(inputs),
}
if not return_clip_flags:
return inputs, history
else:
return inputs, history, flags
def request_gpt_model_in_new_thread_with_ui_alive(
inputs, inputs_show_user, llm_kwargs,
@@ -105,7 +133,7 @@ def request_gpt_model_in_new_thread_with_ui_alive(
except:
# 【第三种情况】:其他错误:重试几次
tb_str = '```\n' + trimmed_format_exc() + '```'
print(tb_str)
logger.error(tb_str)
mutable[0] += f"[Local Message] 警告,在执行过程中遭遇问题, Traceback\n\n{tb_str}\n\n"
if retry_op > 0:
retry_op -= 1
@@ -135,18 +163,30 @@ def request_gpt_model_in_new_thread_with_ui_alive(
yield from update_ui(chatbot=chatbot, history=[]) # 如果最后成功了,则删除报错信息
return final_result
def can_multi_process(llm):
def can_multi_process(llm) -> bool:
from request_llms.bridge_all import model_info
def default_condition(llm) -> bool:
# legacy condition
if llm.startswith('gpt-'): return True
if llm.startswith('api2d-'): return True
if llm.startswith('azure-'): return True
if llm.startswith('spark'): return True
if llm.startswith('zhipuai'): return True
if llm.startswith('zhipuai') or llm.startswith('glm-'): return True
return False
if llm in model_info:
if 'can_multi_thread' in model_info[llm]:
return model_info[llm]['can_multi_thread']
else:
return default_condition(llm)
else:
return default_condition(llm)
def request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
inputs_array, inputs_show_user_array, llm_kwargs,
chatbot, history_array, sys_prompt_array,
refresh_interval=0.2, max_workers=-1, scroller_max_len=30,
refresh_interval=0.2, max_workers=-1, scroller_max_len=75,
handle_token_exceed=True, show_user_at_complete=False,
retry_times_at_unknown_error=2,
):
@@ -243,7 +283,7 @@ def request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
# 【第三种情况】:其他错误
if detect_timeout(): raise RuntimeError("检测到程序终止。")
tb_str = '```\n' + trimmed_format_exc() + '```'
print(tb_str)
logger.error(tb_str)
gpt_say += f"[Local Message] 警告,线程{index}在执行过程中遭遇问题, Traceback\n\n{tb_str}\n\n"
if len(mutable[index][0]) > 0: gpt_say += "此线程失败前收到的回答:\n\n" + mutable[index][0]
if retry_op > 0:
@@ -271,6 +311,8 @@ def request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
futures = [executor.submit(_req_gpt, index, inputs, history, sys_prompt) for index, inputs, history, sys_prompt in zip(
range(len(inputs_array)), inputs_array, history_array, sys_prompt_array)]
cnt = 0
while True:
# yield一次以刷新前端页面
time.sleep(refresh_interval)
@@ -283,8 +325,7 @@ def request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
mutable[thread_index][1] = time.time()
# 在前端打印些好玩的东西
for thread_index, _ in enumerate(worker_done):
print_something_really_funny = "[ ...`"+mutable[thread_index][0][-scroller_max_len:].\
replace('\n', '').replace('`', '.').replace(' ', '.').replace('<br/>', '.....').replace('$', '.')+"`... ]"
print_something_really_funny = f"[ ...`{scolling_visual_effect(mutable[thread_index][0], scroller_max_len)}`... ]"
observe_win.append(print_something_really_funny)
# 在前端打印些好玩的东西
stat_str = ''.join([f'`{mutable[thread_index][2]}`: {obs}\n\n'
@@ -337,7 +378,7 @@ def read_and_clean_pdf_text(fp):
import fitz, copy
import re
import numpy as np
from colorful import print亮黄, print亮绿
# from shared_utils.colorful import print亮黄, print亮绿
fc = 0 # Index 0 文本
fs = 1 # Index 1 字体
fb = 2 # Index 2 框框
@@ -554,15 +595,15 @@ class nougat_interface():
def nougat_with_timeout(self, command, cwd, timeout=3600):
import subprocess
from toolbox import ProxyNetworkActivate
logging.info(f'正在执行命令 {command}')
logger.info(f'正在执行命令 {command}')
with ProxyNetworkActivate("Nougat_Download"):
process = subprocess.Popen(command, shell=True, cwd=cwd, env=os.environ)
process = subprocess.Popen(command, shell=False, cwd=cwd, env=os.environ)
try:
stdout, stderr = process.communicate(timeout=timeout)
except subprocess.TimeoutExpired:
process.kill()
stdout, stderr = process.communicate()
print("Process timed out!")
logger.error("Process timed out!")
return False
return True
@@ -580,7 +621,8 @@ class nougat_interface():
yield from update_ui_lastest_msg("正在解析论文, 请稍候。进度正在加载NOUGAT... 提示首次运行需要花费较长时间下载NOUGAT参数",
chatbot=chatbot, history=history, delay=0)
self.nougat_with_timeout(f'nougat --out "{os.path.abspath(dst)}" "{os.path.abspath(fp)}"', os.getcwd(), timeout=3600)
command = ['nougat', '--out', os.path.abspath(dst), os.path.abspath(fp)]
self.nougat_with_timeout(command, cwd=os.getcwd(), timeout=3600)
res = glob.glob(os.path.join(dst,'*.mmd'))
if len(res) == 0:
self.threadLock.release()

查看文件

@@ -1,8 +1,9 @@
import os
from textwrap import indent
from loguru import logger
class FileNode:
def __init__(self, name):
def __init__(self, name, build_manifest=False):
self.name = name
self.children = []
self.is_leaf = False
@@ -10,6 +11,8 @@ class FileNode:
self.parenting_ship = []
self.comment = ""
self.comment_maxlen_show = 50
self.build_manifest = build_manifest
self.manifest = {}
@staticmethod
def add_linebreaks_at_spaces(string, interval=10):
@@ -29,6 +32,7 @@ class FileNode:
level = 1
if directory_names == "":
new_node = FileNode(file_name)
self.manifest[file_path] = new_node
current_node.children.append(new_node)
new_node.is_leaf = True
new_node.comment = self.sanitize_comment(file_comment)
@@ -50,13 +54,14 @@ class FileNode:
new_node.level = level - 1
current_node = new_node
term = FileNode(file_name)
self.manifest[file_path] = term
term.level = level
term.comment = self.sanitize_comment(file_comment)
term.is_leaf = True
current_node.children.append(term)
def print_files_recursively(self, level=0, code="R0"):
print(' '*level + self.name + ' ' + str(self.is_leaf) + ' ' + str(self.level))
logger.info(' '*level + self.name + ' ' + str(self.is_leaf) + ' ' + str(self.level))
for j, child in enumerate(self.children):
child.print_files_recursively(level=level+1, code=code+str(j))
self.parenting_ship.extend(child.parenting_ship)
@@ -119,4 +124,4 @@ if __name__ == "__main__":
"用于加载和分割文件中的文本的通用文件加载器用于加载和分割文件中的文本的通用文件加载器用于加载和分割文件中的文本的通用文件加载器",
"包含了用于构建和管理向量数据库的函数和类包含了用于构建和管理向量数据库的函数和类包含了用于构建和管理向量数据库的函数和类",
]
print(build_file_tree_mermaid_diagram(file_manifest, file_comments, "项目文件树"))
logger.info(build_file_tree_mermaid_diagram(file_manifest, file_comments, "项目文件树"))

查看文件

@@ -92,7 +92,7 @@ class MiniGame_ResumeStory(GptAcademicGameBaseState):
def generate_story_image(self, story_paragraph):
try:
from crazy_functions.图片生成 import gen_image
from crazy_functions.Image_Generate import gen_image
prompt_ = predict_no_ui_long_connection(inputs=story_paragraph, llm_kwargs=self.llm_kwargs, history=[], sys_prompt='你需要根据用户给出的小说段落,进行简短的环境描写。要求80字以内。')
image_url, image_path = gen_image(self.llm_kwargs, prompt_, '512x512', model="dall-e-2", quality='standard', style='natural')
return f'<br/><div align="center"><img src="file={image_path}"></div>'

查看文件

@@ -24,8 +24,8 @@ class Actor(BaseModel):
film_names: List[str] = Field(description="list of names of films they starred in")
"""
import json, re, logging
import json, re
from loguru import logger as logging
PYDANTIC_FORMAT_INSTRUCTIONS = """The output should be formatted as a JSON instance that conforms to the JSON schema below.
@@ -62,8 +62,8 @@ class GptJsonIO():
if "type" in reduced_schema:
del reduced_schema["type"]
# Ensure json in context is well-formed with double quotes.
if self.example_instruction:
schema_str = json.dumps(reduced_schema)
if self.example_instruction:
return PYDANTIC_FORMAT_INSTRUCTIONS.format(schema=schema_str)
else:
return PYDANTIC_FORMAT_INSTRUCTIONS_SIMPLE.format(schema=schema_str)

查看文件

@@ -0,0 +1,26 @@
from crazy_functions.json_fns.pydantic_io import GptJsonIO, JsonStringError
def structure_output(txt, prompt, err_msg, run_gpt_fn, pydantic_cls):
gpt_json_io = GptJsonIO(pydantic_cls)
analyze_res = run_gpt_fn(
txt,
sys_prompt=prompt + gpt_json_io.format_instructions
)
try:
friend = gpt_json_io.generate_output_auto_repair(analyze_res, run_gpt_fn)
except JsonStringError as e:
return None, err_msg
err_msg = ""
return friend, err_msg
def select_tool(prompt, run_gpt_fn, pydantic_cls):
pydantic_cls_instance, err_msg = structure_output(
txt=prompt,
prompt="根据提示, 分析应该调用哪个工具函数\n\n",
err_msg=f"不能理解该联系人",
run_gpt_fn=run_gpt_fn,
pydantic_cls=pydantic_cls
)
return pydantic_cls_instance, err_msg

查看文件

@@ -1,14 +1,17 @@
from toolbox import update_ui, update_ui_lastest_msg, get_log_folder
from toolbox import get_conf, objdump, objload, promote_file_to_downloadzone
from .latex_toolbox import PRESERVE, TRANSFORM
from .latex_toolbox import set_forbidden_text, set_forbidden_text_begin_end, set_forbidden_text_careful_brace
from .latex_toolbox import reverse_forbidden_text_careful_brace, reverse_forbidden_text, convert_to_linklist, post_process
from .latex_toolbox import fix_content, find_main_tex_file, merge_tex_files, compile_latex_with_timeout
from .latex_toolbox import find_title_and_abs
import os, shutil
import os
import re
import shutil
import numpy as np
from loguru import logger
from toolbox import update_ui, update_ui_lastest_msg, get_log_folder, gen_time_str
from toolbox import get_conf, promote_file_to_downloadzone
from crazy_functions.latex_fns.latex_toolbox import PRESERVE, TRANSFORM
from crazy_functions.latex_fns.latex_toolbox import set_forbidden_text, set_forbidden_text_begin_end, set_forbidden_text_careful_brace
from crazy_functions.latex_fns.latex_toolbox import reverse_forbidden_text_careful_brace, reverse_forbidden_text, convert_to_linklist, post_process
from crazy_functions.latex_fns.latex_toolbox import fix_content, find_main_tex_file, merge_tex_files, compile_latex_with_timeout
from crazy_functions.latex_fns.latex_toolbox import find_title_and_abs
from crazy_functions.latex_fns.latex_pickle_io import objdump, objload
pj = os.path.join
@@ -322,7 +325,7 @@ def remove_buggy_lines(file_path, log_path, tex_name, tex_name_pure, n_fix, work
buggy_lines = [int(l) for l in buggy_lines]
buggy_lines = sorted(buggy_lines)
buggy_line = buggy_lines[0]-1
print("reversing tex line that has errors", buggy_line)
logger.warning("reversing tex line that has errors", buggy_line)
# 重组,逆转出错的段落
if buggy_line not in fixed_line:
@@ -336,7 +339,7 @@ def remove_buggy_lines(file_path, log_path, tex_name, tex_name_pure, n_fix, work
return True, f"{tex_name_pure}_fix_{n_fix}", buggy_lines
except:
print("Fatal error occurred, but we cannot identify error, please download zip, read latex log, and compile manually.")
logger.error("Fatal error occurred, but we cannot identify error, please download zip, read latex log, and compile manually.")
return False, -1, [-1]
@@ -379,7 +382,7 @@ def 编译Latex(chatbot, history, main_file_original, main_file_modified, work_f
if mode!='translate_zh':
yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 使用latexdiff生成论文转化前后对比 ...', chatbot, history) # 刷新Gradio前端界面
print( f'latexdiff --encoding=utf8 --append-safecmd=subfile {work_folder_original}/{main_file_original}.tex {work_folder_modified}/{main_file_modified}.tex --flatten > {work_folder}/merge_diff.tex')
logger.info( f'latexdiff --encoding=utf8 --append-safecmd=subfile {work_folder_original}/{main_file_original}.tex {work_folder_modified}/{main_file_modified}.tex --flatten > {work_folder}/merge_diff.tex')
ok = compile_latex_with_timeout(f'latexdiff --encoding=utf8 --append-safecmd=subfile {work_folder_original}/{main_file_original}.tex {work_folder_modified}/{main_file_modified}.tex --flatten > {work_folder}/merge_diff.tex', os.getcwd())
yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 正在编译对比PDF ...', chatbot, history) # 刷新Gradio前端界面
@@ -418,7 +421,7 @@ def 编译Latex(chatbot, history, main_file_original, main_file_modified, work_f
shutil.copyfile(concat_pdf, pj(work_folder, '..', 'translation', 'comparison.pdf'))
promote_file_to_downloadzone(concat_pdf, rename_file=None, chatbot=chatbot) # promote file to web UI
except Exception as e:
print(e)
logger.error(e)
pass
return True # 成功啦
else:
@@ -464,4 +467,71 @@ def write_html(sp_file_contents, sp_file_result, chatbot, project_folder):
promote_file_to_downloadzone(file=res, chatbot=chatbot)
except:
from toolbox import trimmed_format_exc
print('writing html result failed:', trimmed_format_exc())
logger.error('writing html result failed:', trimmed_format_exc())
def upload_to_gptac_cloud_if_user_allow(chatbot, arxiv_id):
try:
# 如果用户允许,我们将arxiv论文PDF上传到GPTAC学术云
from toolbox import map_file_to_sha256
# 检查是否顺利,如果没有生成预期的文件,则跳过
is_result_good = False
for file_path in chatbot._cookies.get("files_to_promote", []):
if file_path.endswith('translate_zh.pdf'):
is_result_good = True
if not is_result_good:
return
# 上传文件
for file_path in chatbot._cookies.get("files_to_promote", []):
align_name = None
# normalized name
for name in ['translate_zh.pdf', 'comparison.pdf']:
if file_path.endswith(name): align_name = name
# if match any align name
if align_name:
logger.info(f'Uploading to GPTAC cloud as the user has set `allow_cloud_io`: {file_path}')
with open(file_path, 'rb') as f:
import requests
url = 'https://cloud-2.agent-matrix.com/arxiv_tf_paper_normal_upload'
files = {'file': (align_name, f, 'application/octet-stream')}
data = {
'arxiv_id': arxiv_id,
'file_hash': map_file_to_sha256(file_path),
'language': 'zh',
'trans_prompt': 'to_be_implemented',
'llm_model': 'to_be_implemented',
'llm_model_param': 'to_be_implemented',
}
resp = requests.post(url=url, files=files, data=data, timeout=30)
logger.info(f'Uploading terminate ({resp.status_code})`: {file_path}')
except:
# 如果上传失败,不会中断程序,因为这是次要功能
pass
def check_gptac_cloud(arxiv_id, chatbot):
import requests
success = False
downloaded = []
try:
for pdf_target in ['translate_zh.pdf', 'comparison.pdf']:
url = 'https://cloud-2.agent-matrix.com/arxiv_tf_paper_normal_exist'
data = {
'arxiv_id': arxiv_id,
'name': pdf_target,
}
resp = requests.post(url=url, data=data)
cache_hit_result = resp.text.strip('"')
if cache_hit_result.startswith("http"):
url = cache_hit_result
logger.info(f'Downloading from GPTAC cloud: {url}')
resp = requests.get(url=url, timeout=30)
target = os.path.join(get_log_folder(plugin_name='gptac_cloud'), gen_time_str(), pdf_target)
os.makedirs(os.path.dirname(target), exist_ok=True)
with open(target, 'wb') as f:
f.write(resp.content)
new_path = promote_file_to_downloadzone(target, chatbot=chatbot)
success = True
downloaded.append(new_path)
except:
pass
return success, downloaded

查看文件

@@ -0,0 +1,48 @@
import pickle
class SafeUnpickler(pickle.Unpickler):
def get_safe_classes(self):
from crazy_functions.latex_fns.latex_actions import LatexPaperFileGroup, LatexPaperSplit
from crazy_functions.latex_fns.latex_toolbox import LinkedListNode
from numpy.core.multiarray import scalar
from numpy import dtype
# 定义允许的安全类
safe_classes = {
# 在这里添加其他安全的类
'LatexPaperFileGroup': LatexPaperFileGroup,
'LatexPaperSplit': LatexPaperSplit,
'LinkedListNode': LinkedListNode,
'scalar': scalar,
'dtype': dtype,
}
return safe_classes
def find_class(self, module, name):
# 只允许特定的类进行反序列化
self.safe_classes = self.get_safe_classes()
match_class_name = None
for class_name in self.safe_classes.keys():
if (class_name in f'{module}.{name}'):
match_class_name = class_name
if match_class_name is not None:
return self.safe_classes[match_class_name]
# 如果尝试加载未授权的类,则抛出异常
raise pickle.UnpicklingError(f"Attempted to deserialize unauthorized class '{name}' from module '{module}'")
def objdump(obj, file="objdump.tmp"):
with open(file, "wb+") as f:
pickle.dump(obj, f)
return
def objload(file="objdump.tmp"):
import os
if not os.path.exists(file):
return
with open(file, "rb") as f:
unpickler = SafeUnpickler(f)
return unpickler.load()

查看文件

@@ -1,6 +1,8 @@
import os, shutil
import os
import re
import shutil
import numpy as np
from loguru import logger
PRESERVE = 0
TRANSFORM = 1
@@ -55,7 +57,7 @@ def post_process(root):
str_stack.append("{")
elif c == "}":
if len(str_stack) == 1:
print("stack fix")
logger.warning("fixing brace error")
return i
str_stack.pop(-1)
else:
@@ -601,7 +603,7 @@ def compile_latex_with_timeout(command, cwd, timeout=60):
except subprocess.TimeoutExpired:
process.kill()
stdout, stderr = process.communicate()
print("Process timed out!")
logger.error("Process timed out (compile_latex_with_timeout)!")
return False
return True
@@ -642,6 +644,216 @@ def run_in_subprocess(func):
def _merge_pdfs(pdf1_path, pdf2_path, output_path):
try:
logger.info("Merging PDFs using _merge_pdfs_ng")
_merge_pdfs_ng(pdf1_path, pdf2_path, output_path)
except:
logger.info("Merging PDFs using _merge_pdfs_legacy")
_merge_pdfs_legacy(pdf1_path, pdf2_path, output_path)
def _merge_pdfs_ng(pdf1_path, pdf2_path, output_path):
import PyPDF2 # PyPDF2这个库有严重的内存泄露问题,把它放到子进程中运行,从而方便内存的释放
from PyPDF2.generic import NameObject, TextStringObject, ArrayObject, FloatObject, NumberObject
Percent = 1
# raise RuntimeError('PyPDF2 has a serious memory leak problem, please use other tools to merge PDF files.')
# Open the first PDF file
with open(pdf1_path, "rb") as pdf1_file:
pdf1_reader = PyPDF2.PdfFileReader(pdf1_file)
# Open the second PDF file
with open(pdf2_path, "rb") as pdf2_file:
pdf2_reader = PyPDF2.PdfFileReader(pdf2_file)
# Create a new PDF file to store the merged pages
output_writer = PyPDF2.PdfFileWriter()
# Determine the number of pages in each PDF file
num_pages = max(pdf1_reader.numPages, pdf2_reader.numPages)
# Merge the pages from the two PDF files
for page_num in range(num_pages):
# Add the page from the first PDF file
if page_num < pdf1_reader.numPages:
page1 = pdf1_reader.getPage(page_num)
else:
page1 = PyPDF2.PageObject.createBlankPage(pdf1_reader)
# Add the page from the second PDF file
if page_num < pdf2_reader.numPages:
page2 = pdf2_reader.getPage(page_num)
else:
page2 = PyPDF2.PageObject.createBlankPage(pdf1_reader)
# Create a new empty page with double width
new_page = PyPDF2.PageObject.createBlankPage(
width=int(
int(page1.mediaBox.getWidth())
+ int(page2.mediaBox.getWidth()) * Percent
),
height=max(page1.mediaBox.getHeight(), page2.mediaBox.getHeight()),
)
new_page.mergeTranslatedPage(page1, 0, 0)
new_page.mergeTranslatedPage(
page2,
int(
int(page1.mediaBox.getWidth())
- int(page2.mediaBox.getWidth()) * (1 - Percent)
),
0,
)
if "/Annots" in new_page:
annotations = new_page["/Annots"]
for i, annot in enumerate(annotations):
annot_obj = annot.get_object()
# 检查注释类型是否是链接(/Link
if annot_obj.get("/Subtype") == "/Link":
# 检查是否为内部链接跳转(/GoTo或外部URI链接/URI
action = annot_obj.get("/A")
if action:
if "/S" in action and action["/S"] == "/GoTo":
# 内部链接:跳转到文档中的某个页面
dest = action.get("/D") # 目标页或目标位置
# if dest and annot.idnum in page2_annot_id:
# if dest in pdf2_reader.named_destinations:
if dest and page2.annotations:
if annot in page2.annotations:
# 获取原始文件中跳转信息,包括跳转页面
destination = pdf2_reader.named_destinations[
dest
]
page_number = (
pdf2_reader.get_destination_page_number(
destination
)
)
# 更新跳转信息,跳转到对应的页面和,指定坐标 (100, 150),缩放比例为 100%
# “/D”:[10,'/XYZ',100,100,0]
if destination.dest_array[1] == "/XYZ":
annot_obj["/A"].update(
{
NameObject("/D"): ArrayObject(
[
NumberObject(page_number),
destination.dest_array[1],
FloatObject(
destination.dest_array[
2
]
+ int(
page1.mediaBox.getWidth()
)
),
destination.dest_array[3],
destination.dest_array[4],
]
) # 确保键和值是 PdfObject
}
)
else:
annot_obj["/A"].update(
{
NameObject("/D"): ArrayObject(
[
NumberObject(page_number),
destination.dest_array[1],
]
) # 确保键和值是 PdfObject
}
)
rect = annot_obj.get("/Rect")
# 更新点击坐标
rect = ArrayObject(
[
FloatObject(
rect[0]
+ int(page1.mediaBox.getWidth())
),
rect[1],
FloatObject(
rect[2]
+ int(page1.mediaBox.getWidth())
),
rect[3],
]
)
annot_obj.update(
{
NameObject(
"/Rect"
): rect # 确保键和值是 PdfObject
}
)
# if dest and annot.idnum in page1_annot_id:
# if dest in pdf1_reader.named_destinations:
if dest and page1.annotations:
if annot in page1.annotations:
# 获取原始文件中跳转信息,包括跳转页面
destination = pdf1_reader.named_destinations[
dest
]
page_number = (
pdf1_reader.get_destination_page_number(
destination
)
)
# 更新跳转信息,跳转到对应的页面和,指定坐标 (100, 150),缩放比例为 100%
# “/D”:[10,'/XYZ',100,100,0]
if destination.dest_array[1] == "/XYZ":
annot_obj["/A"].update(
{
NameObject("/D"): ArrayObject(
[
NumberObject(page_number),
destination.dest_array[1],
FloatObject(
destination.dest_array[
2
]
),
destination.dest_array[3],
destination.dest_array[4],
]
) # 确保键和值是 PdfObject
}
)
else:
annot_obj["/A"].update(
{
NameObject("/D"): ArrayObject(
[
NumberObject(page_number),
destination.dest_array[1],
]
) # 确保键和值是 PdfObject
}
)
rect = annot_obj.get("/Rect")
rect = ArrayObject(
[
FloatObject(rect[0]),
rect[1],
FloatObject(rect[2]),
rect[3],
]
)
annot_obj.update(
{
NameObject(
"/Rect"
): rect # 确保键和值是 PdfObject
}
)
elif "/S" in action and action["/S"] == "/URI":
# 外部链接跳转到某个URI
uri = action.get("/URI")
output_writer.addPage(new_page)
# Save the merged PDF file
with open(output_path, "wb") as output_file:
output_writer.write(output_file)
def _merge_pdfs_legacy(pdf1_path, pdf2_path, output_path):
import PyPDF2 # PyPDF2这个库有严重的内存泄露问题,把它放到子进程中运行,从而方便内存的释放
Percent = 0.95

查看文件

@@ -1,5 +1,6 @@
import time, logging, json, sys, struct
import time, json, sys, struct
import numpy as np
from loguru import logger as logging
from scipy.io.wavfile import WAVE_FORMAT
def write_numpy_to_wave(filename, rate, data, add_header=False):
@@ -106,18 +107,14 @@ def is_speaker_speaking(vad, data, sample_rate):
class AliyunASR():
def test_on_sentence_begin(self, message, *args):
# print("test_on_sentence_begin:{}".format(message))
pass
def test_on_sentence_end(self, message, *args):
# print("test_on_sentence_end:{}".format(message))
message = json.loads(message)
self.parsed_sentence = message['payload']['result']
self.event_on_entence_end.set()
# print(self.parsed_sentence)
def test_on_start(self, message, *args):
# print("test_on_start:{}".format(message))
pass
def test_on_error(self, message, *args):
@@ -129,13 +126,11 @@ class AliyunASR():
pass
def test_on_result_chg(self, message, *args):
# print("test_on_chg:{}".format(message))
message = json.loads(message)
self.parsed_text = message['payload']['result']
self.event_on_result_chg.set()
def test_on_completed(self, message, *args):
# print("on_completed:args=>{} message=>{}".format(args, message))
pass
def audio_convertion_thread(self, uuid):
@@ -248,14 +243,14 @@ class AliyunASR():
try:
response = client.do_action_with_exception(request)
print(response)
logging.info(response)
jss = json.loads(response)
if 'Token' in jss and 'Id' in jss['Token']:
token = jss['Token']['Id']
expireTime = jss['Token']['ExpireTime']
print("token = " + token)
print("expireTime = " + str(expireTime))
logging.info("token = " + token)
logging.info("expireTime = " + str(expireTime))
except Exception as e:
print(e)
logging.error(e)
return token

查看文件

@@ -1,4 +1,5 @@
from crazy_functions.ipc_fns.mp import run_in_subprocess_with_timeout
from loguru import logger
def force_breakdown(txt, limit, get_token_fn):
""" 当无法用标点、空行分割时,我们用最暴力的方法切割
@@ -76,7 +77,7 @@ def cut(limit, get_token_fn, txt_tocut, must_break_at_empty_line, break_anyway=F
remain_txt_to_cut = post
remain_txt_to_cut, remain_txt_to_cut_storage = maintain_storage(remain_txt_to_cut, remain_txt_to_cut_storage)
process = fin_len/total_len
print(f'正在文本切分 {int(process*100)}%')
logger.info(f'正在文本切分 {int(process*100)}%')
if len(remain_txt_to_cut.strip()) == 0:
break
return res
@@ -119,7 +120,7 @@ if __name__ == '__main__':
for i in range(5):
file_content += file_content
print(len(file_content))
logger.info(len(file_content))
TOKEN_LIMIT_PER_FRAGMENT = 2500
res = breakdown_text_to_satisfy_token_limit(file_content, TOKEN_LIMIT_PER_FRAGMENT)

查看文件

@@ -4,7 +4,7 @@ from toolbox import promote_file_to_downloadzone
from toolbox import write_history_to_file, promote_file_to_downloadzone
from toolbox import get_conf
from toolbox import ProxyNetworkActivate
from colorful import *
from shared_utils.colorful import *
import requests
import random
import copy
@@ -72,7 +72,7 @@ def produce_report_markdown(gpt_response_collection, meta, paper_meta_info, chat
generated_conclusion_files.append(res_path)
return res_path
def translate_pdf(article_dict, llm_kwargs, chatbot, fp, generated_conclusion_files, TOKEN_LIMIT_PER_FRAGMENT, DST_LANG):
def translate_pdf(article_dict, llm_kwargs, chatbot, fp, generated_conclusion_files, TOKEN_LIMIT_PER_FRAGMENT, DST_LANG, plugin_kwargs={}):
from crazy_functions.pdf_fns.report_gen_html import construct_html
from crazy_functions.pdf_fns.breakdown_txt import breakdown_text_to_satisfy_token_limit
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
@@ -138,7 +138,7 @@ def translate_pdf(article_dict, llm_kwargs, chatbot, fp, generated_conclusion_fi
chatbot=chatbot,
history_array=[meta for _ in inputs_array],
sys_prompt_array=[
"请你作为一个学术翻译,负责把学术论文准确翻译成中文。注意文章中的每一句话都要翻译。" for _ in inputs_array],
"请你作为一个学术翻译,负责把学术论文准确翻译成中文。注意文章中的每一句话都要翻译。" + plugin_kwargs.get("additional_prompt", "") for _ in inputs_array],
)
# -=-=-=-=-=-=-=-= 写出Markdown文件 -=-=-=-=-=-=-=-=
produce_report_markdown(gpt_response_collection, meta, paper_meta_info, chatbot, fp, generated_conclusion_files)

查看文件

@@ -0,0 +1,26 @@
import os
from toolbox import CatchException, report_exception, get_log_folder, gen_time_str, check_packages
from toolbox import update_ui, promote_file_to_downloadzone, update_ui_lastest_msg, disable_auto_promotion
from toolbox import write_history_to_file, promote_file_to_downloadzone, get_conf, extract_archive
from crazy_functions.pdf_fns.parse_pdf import parse_pdf, translate_pdf
def 解析PDF_基于GROBID(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, grobid_url):
import copy, json
TOKEN_LIMIT_PER_FRAGMENT = 1024
generated_conclusion_files = []
generated_html_files = []
DST_LANG = "中文"
from crazy_functions.pdf_fns.report_gen_html import construct_html
for index, fp in enumerate(file_manifest):
chatbot.append(["当前进度:", f"正在连接GROBID服务,请稍候: {grobid_url}\n如果等待时间过长,请修改config中的GROBID_URL,可修改成本地GROBID服务。"]); yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
article_dict = parse_pdf(fp, grobid_url)
grobid_json_res = os.path.join(get_log_folder(), gen_time_str() + "grobid.json")
with open(grobid_json_res, 'w+', encoding='utf8') as f:
f.write(json.dumps(article_dict, indent=4, ensure_ascii=False))
promote_file_to_downloadzone(grobid_json_res, chatbot=chatbot)
if article_dict is None: raise RuntimeError("解析PDF失败,请检查PDF是否损坏。")
yield from translate_pdf(article_dict, llm_kwargs, chatbot, fp, generated_conclusion_files, TOKEN_LIMIT_PER_FRAGMENT, DST_LANG, plugin_kwargs=plugin_kwargs)
chatbot.append(("给出输出文件清单", str(generated_conclusion_files + generated_html_files)))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面

查看文件

@@ -1,83 +1,16 @@
from toolbox import CatchException, report_exception, get_log_folder, gen_time_str, check_packages
from toolbox import update_ui, promote_file_to_downloadzone, update_ui_lastest_msg, disable_auto_promotion
from toolbox import get_log_folder
from toolbox import update_ui, promote_file_to_downloadzone
from toolbox import write_history_to_file, promote_file_to_downloadzone
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
from .crazy_utils import read_and_clean_pdf_text
from .pdf_fns.parse_pdf import parse_pdf, get_avail_grobid_url, translate_pdf
from colorful import *
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from crazy_functions.crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
from crazy_functions.crazy_utils import read_and_clean_pdf_text
from shared_utils.colorful import *
from loguru import logger
import os
@CatchException
def 批量翻译PDF文档(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
disable_auto_promotion(chatbot)
# 基本信息:功能、贡献者
chatbot.append([
"函数插件功能?",
"批量翻译PDF文档。函数插件贡献者: Binary-Husky"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# 尝试导入依赖,如果缺少依赖,则给出安装建议
try:
check_packages(["fitz", "tiktoken", "scipdf"])
except:
report_exception(chatbot, history,
a=f"解析项目: {txt}",
b=f"导入软件依赖失败。使用该模块需要额外依赖,安装方法```pip install --upgrade pymupdf tiktoken scipdf_parser```。")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# 清空历史,以免输入溢出
history = []
from .crazy_utils import get_files_from_everything
success, file_manifest, project_folder = get_files_from_everything(txt, type='.pdf')
# 检测输入参数,如没有给定输入参数,直接退出
if not success:
if txt == "": txt = '空空如也的输入栏'
# 如果没找到任何文件
if len(file_manifest) == 0:
report_exception(chatbot, history,
a=f"解析项目: {txt}", b=f"找不到任何.pdf拓展名的文件: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# 开始正式执行任务
grobid_url = get_avail_grobid_url()
if grobid_url is not None:
yield from 解析PDF_基于GROBID(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, grobid_url)
else:
yield from update_ui_lastest_msg("GROBID服务不可用,请检查config中的GROBID_URL。作为替代,现在将执行效果稍差的旧版代码。", chatbot, history, delay=3)
yield from 解析PDF(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt)
def 解析PDF_基于GROBID(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, grobid_url):
import copy, json
TOKEN_LIMIT_PER_FRAGMENT = 1024
generated_conclusion_files = []
generated_html_files = []
DST_LANG = "中文"
from crazy_functions.pdf_fns.report_gen_html import construct_html
for index, fp in enumerate(file_manifest):
chatbot.append(["当前进度:", f"正在连接GROBID服务,请稍候: {grobid_url}\n如果等待时间过长,请修改config中的GROBID_URL,可修改成本地GROBID服务。"]); yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
article_dict = parse_pdf(fp, grobid_url)
grobid_json_res = os.path.join(get_log_folder(), gen_time_str() + "grobid.json")
with open(grobid_json_res, 'w+', encoding='utf8') as f:
f.write(json.dumps(article_dict, indent=4, ensure_ascii=False))
promote_file_to_downloadzone(grobid_json_res, chatbot=chatbot)
if article_dict is None: raise RuntimeError("解析PDF失败,请检查PDF是否损坏。")
yield from translate_pdf(article_dict, llm_kwargs, chatbot, fp, generated_conclusion_files, TOKEN_LIMIT_PER_FRAGMENT, DST_LANG)
chatbot.append(("给出输出文件清单", str(generated_conclusion_files + generated_html_files)))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
def 解析PDF(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
def 解析PDF_简单拆解(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
"""
此函数已经弃用
注意此函数已经弃用新函数位于crazy_functions/pdf_fns/parse_pdf.py
"""
import copy
TOKEN_LIMIT_PER_FRAGMENT = 1024
@@ -116,7 +49,8 @@ def 解析PDF(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot,
chatbot=chatbot,
history_array=[[paper_meta] for _ in paper_fragments],
sys_prompt_array=[
"请你作为一个学术翻译,负责把学术论文准确翻译成中文。注意文章中的每一句话都要翻译。" for _ in paper_fragments],
"请你作为一个学术翻译,负责把学术论文准确翻译成中文。注意文章中的每一句话都要翻译。" + plugin_kwargs.get("additional_prompt", "")
for _ in paper_fragments],
# max_workers=5 # OpenAI所允许的最大并行过载
)
gpt_response_collection_md = copy.deepcopy(gpt_response_collection)
@@ -160,7 +94,7 @@ def 解析PDF(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot,
generated_html_files.append(ch.save_file(create_report_file_name))
except:
from toolbox import trimmed_format_exc
print('writing html result failed:', trimmed_format_exc())
logger.error('writing html result failed:', trimmed_format_exc())
# 准备文件的下载
for pdf_path in generated_conclusion_files:

查看文件

@@ -0,0 +1,250 @@
from toolbox import get_log_folder, gen_time_str, get_conf
from toolbox import update_ui, promote_file_to_downloadzone
from toolbox import promote_file_to_downloadzone, extract_archive
from toolbox import generate_file_link, zip_folder
from crazy_functions.crazy_utils import get_files_from_everything
from shared_utils.colorful import *
from loguru import logger
import os
import time
def refresh_key(doc2x_api_key):
import requests, json
url = "https://api.doc2x.noedgeai.com/api/token/refresh"
res = requests.post(
url,
headers={"Authorization": "Bearer " + doc2x_api_key}
)
res_json = []
if res.status_code == 200:
decoded = res.content.decode("utf-8")
res_json = json.loads(decoded)
doc2x_api_key = res_json['data']['token']
else:
raise RuntimeError(format("[ERROR] status code: %d, body: %s" % (res.status_code, res.text)))
return doc2x_api_key
def 解析PDF_DOC2X_转Latex(pdf_file_path):
zip_file_path, unzipped_folder = 解析PDF_DOC2X(pdf_file_path, format='tex')
return unzipped_folder
def 解析PDF_DOC2X(pdf_file_path, format='tex'):
"""
format: 'tex', 'md', 'docx'
"""
import requests, json, os
DOC2X_API_KEY = get_conf('DOC2X_API_KEY')
latex_dir = get_log_folder(plugin_name="pdf_ocr_latex")
markdown_dir = get_log_folder(plugin_name="pdf_ocr")
doc2x_api_key = DOC2X_API_KEY
# < ------ 第1步上传 ------ >
logger.info("Doc2x 第1步上传")
with open(pdf_file_path, 'rb') as file:
res = requests.post(
"https://v2.doc2x.noedgeai.com/api/v2/parse/pdf",
headers={"Authorization": "Bearer " + doc2x_api_key},
data=file
)
# res_json = []
if res.status_code == 200:
res_json = res.json()
else:
raise RuntimeError(f"Doc2x return an error: {res.json()}")
uuid = res_json['data']['uid']
# < ------ 第2步轮询等待 ------ >
logger.info("Doc2x 第2步轮询等待")
params = {'uid': uuid}
while True:
res = requests.get(
'https://v2.doc2x.noedgeai.com/api/v2/parse/status',
headers={"Authorization": "Bearer " + doc2x_api_key},
params=params
)
res_json = res.json()
if res_json['data']['status'] == "success":
break
elif res_json['data']['status'] == "processing":
time.sleep(3)
logger.info(f"Doc2x is processing at {res_json['data']['progress']}%")
elif res_json['data']['status'] == "failed":
raise RuntimeError(f"Doc2x return an error: {res_json}")
# < ------ 第3步提交转化 ------ >
logger.info("Doc2x 第3步提交转化")
data = {
"uid": uuid,
"to": format,
"formula_mode": "dollar",
"filename": "output"
}
res = requests.post(
'https://v2.doc2x.noedgeai.com/api/v2/convert/parse',
headers={"Authorization": "Bearer " + doc2x_api_key},
json=data
)
if res.status_code == 200:
res_json = res.json()
else:
raise RuntimeError(f"Doc2x return an error: {res.json()}")
# < ------ 第4步等待结果 ------ >
logger.info("Doc2x 第4步等待结果")
params = {'uid': uuid}
while True:
res = requests.get(
'https://v2.doc2x.noedgeai.com/api/v2/convert/parse/result',
headers={"Authorization": "Bearer " + doc2x_api_key},
params=params
)
res_json = res.json()
if res_json['data']['status'] == "success":
break
elif res_json['data']['status'] == "processing":
time.sleep(3)
logger.info(f"Doc2x still processing")
elif res_json['data']['status'] == "failed":
raise RuntimeError(f"Doc2x return an error: {res_json}")
# < ------ 第5步最后的处理 ------ >
logger.info("Doc2x 第5步最后的处理")
if format=='tex':
target_path = latex_dir
if format=='md':
target_path = markdown_dir
os.makedirs(target_path, exist_ok=True)
max_attempt = 3
# < ------ 下载 ------ >
for attempt in range(max_attempt):
try:
result_url = res_json['data']['url']
res = requests.get(result_url)
zip_path = os.path.join(target_path, gen_time_str() + '.zip')
unzip_path = os.path.join(target_path, gen_time_str())
if res.status_code == 200:
with open(zip_path, "wb") as f: f.write(res.content)
else:
raise RuntimeError(f"Doc2x return an error: {res.json()}")
except Exception as e:
if attempt < max_attempt - 1:
logger.error(f"Failed to download latex file, retrying... {e}")
time.sleep(3)
continue
else:
raise e
# < ------ 解压 ------ >
import zipfile
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
zip_ref.extractall(unzip_path)
return zip_path, unzip_path
def 解析PDF_DOC2X_单文件(fp, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, DOC2X_API_KEY, user_request):
def pdf2markdown(filepath):
chatbot.append((None, f"Doc2x 解析中"))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
md_zip_path, unzipped_folder = 解析PDF_DOC2X(filepath, format='md')
promote_file_to_downloadzone(md_zip_path, chatbot=chatbot)
chatbot.append((None, f"完成解析 {md_zip_path} ..."))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return md_zip_path
def deliver_to_markdown_plugin(md_zip_path, user_request):
from crazy_functions.Markdown_Translate import Markdown英译中
import shutil, re
time_tag = gen_time_str()
target_path_base = get_log_folder(chatbot.get_user())
file_origin_name = os.path.basename(md_zip_path)
this_file_path = os.path.join(target_path_base, file_origin_name)
os.makedirs(target_path_base, exist_ok=True)
shutil.copyfile(md_zip_path, this_file_path)
ex_folder = this_file_path + ".extract"
extract_archive(
file_path=this_file_path, dest_dir=ex_folder
)
# edit markdown files
success, file_manifest, project_folder = get_files_from_everything(ex_folder, type='.md')
for generated_fp in file_manifest:
# 修正一些公式问题
with open(generated_fp, 'r', encoding='utf8') as f:
content = f.read()
# 将公式中的\[ \]替换成$$
content = content.replace(r'\[', r'$$').replace(r'\]', r'$$')
# 将公式中的\( \)替换成$
content = content.replace(r'\(', r'$').replace(r'\)', r'$')
content = content.replace('```markdown', '\n').replace('```', '\n')
with open(generated_fp, 'w', encoding='utf8') as f:
f.write(content)
promote_file_to_downloadzone(generated_fp, chatbot=chatbot)
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# 生成在线预览html
file_name = '在线预览翻译(原文)' + gen_time_str() + '.html'
preview_fp = os.path.join(ex_folder, file_name)
from shared_utils.advanced_markdown_format import markdown_convertion_for_file
with open(generated_fp, "r", encoding="utf-8") as f:
md = f.read()
# # Markdown中使用不标准的表格,需要在表格前加上一个emoji,以便公式渲染
# md = re.sub(r'^<table>', r'.<table>', md, flags=re.MULTILINE)
html = markdown_convertion_for_file(md)
with open(preview_fp, "w", encoding="utf-8") as f: f.write(html)
chatbot.append([None, f"生成在线预览:{generate_file_link([preview_fp])}"])
promote_file_to_downloadzone(preview_fp, chatbot=chatbot)
chatbot.append((None, f"调用Markdown插件 {ex_folder} ..."))
plugin_kwargs['markdown_expected_output_dir'] = ex_folder
translated_f_name = 'translated_markdown.md'
generated_fp = plugin_kwargs['markdown_expected_output_path'] = os.path.join(ex_folder, translated_f_name)
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
yield from Markdown英译中(ex_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request)
if os.path.exists(generated_fp):
# 修正一些公式问题
with open(generated_fp, 'r', encoding='utf8') as f: content = f.read()
content = content.replace('```markdown', '\n').replace('```', '\n')
# Markdown中使用不标准的表格,需要在表格前加上一个emoji,以便公式渲染
# content = re.sub(r'^<table>', r'.<table>', content, flags=re.MULTILINE)
with open(generated_fp, 'w', encoding='utf8') as f: f.write(content)
# 生成在线预览html
file_name = '在线预览翻译' + gen_time_str() + '.html'
preview_fp = os.path.join(ex_folder, file_name)
from shared_utils.advanced_markdown_format import markdown_convertion_for_file
with open(generated_fp, "r", encoding="utf-8") as f:
md = f.read()
html = markdown_convertion_for_file(md)
with open(preview_fp, "w", encoding="utf-8") as f: f.write(html)
promote_file_to_downloadzone(preview_fp, chatbot=chatbot)
# 生成包含图片的压缩包
dest_folder = get_log_folder(chatbot.get_user())
zip_name = '翻译后的带图文档.zip'
zip_folder(source_folder=ex_folder, dest_folder=dest_folder, zip_name=zip_name)
zip_fp = os.path.join(dest_folder, zip_name)
promote_file_to_downloadzone(zip_fp, chatbot=chatbot)
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
md_zip_path = yield from pdf2markdown(fp)
yield from deliver_to_markdown_plugin(md_zip_path, user_request)
def 解析PDF_基于DOC2X(file_manifest, *args):
for index, fp in enumerate(file_manifest):
yield from 解析PDF_DOC2X_单文件(fp, *args)
return

查看文件

@@ -0,0 +1,85 @@
from crazy_functions.crazy_utils import read_and_clean_pdf_text, get_files_from_everything
import os
import re
def extract_text_from_files(txt, chatbot, history):
"""
查找pdf/md/word并获取文本内容并返回状态以及文本
输入参数 Args:
chatbot: chatbot inputs and outputs (用户界面对话窗口句柄,用于数据流可视化)
history (list): List of chat history (历史,对话历史列表)
输出 Returns:
文件是否存在(bool)
final_result(list):文本内容
page_one(list):第一页内容/摘要
file_manifest(list):文件路径
excption(string):需要用户手动处理的信息,如没出错则保持为空
"""
final_result = []
page_one = []
file_manifest = []
excption = ""
if txt == "":
final_result.append(txt)
return False, final_result, page_one, file_manifest, excption #如输入区内容不是文件则直接返回输入区内容
#查找输入区内容中的文件
file_pdf,pdf_manifest,folder_pdf = get_files_from_everything(txt, '.pdf')
file_md,md_manifest,folder_md = get_files_from_everything(txt, '.md')
file_word,word_manifest,folder_word = get_files_from_everything(txt, '.docx')
file_doc,doc_manifest,folder_doc = get_files_from_everything(txt, '.doc')
if file_doc:
excption = "word"
return False, final_result, page_one, file_manifest, excption
file_num = len(pdf_manifest) + len(md_manifest) + len(word_manifest)
if file_num == 0:
final_result.append(txt)
return False, final_result, page_one, file_manifest, excption #如输入区内容不是文件则直接返回输入区内容
if file_pdf:
try: # 尝试导入依赖,如果缺少依赖,则给出安装建议
import fitz
except:
excption = "pdf"
return False, final_result, page_one, file_manifest, excption
for index, fp in enumerate(pdf_manifest):
file_content, pdf_one = read_and_clean_pdf_text(fp) # 尝试按照章节切割PDF
file_content = file_content.encode('utf-8', 'ignore').decode() # avoid reading non-utf8 chars
pdf_one = str(pdf_one).encode('utf-8', 'ignore').decode() # avoid reading non-utf8 chars
final_result.append(file_content)
page_one.append(pdf_one)
file_manifest.append(os.path.relpath(fp, folder_pdf))
if file_md:
for index, fp in enumerate(md_manifest):
with open(fp, 'r', encoding='utf-8', errors='replace') as f:
file_content = f.read()
file_content = file_content.encode('utf-8', 'ignore').decode()
headers = re.findall(r'^#\s(.*)$', file_content, re.MULTILINE) #接下来提取md中的一级/二级标题作为摘要
if len(headers) > 0:
page_one.append("\n".join(headers)) #合并所有的标题,以换行符分割
else:
page_one.append("")
final_result.append(file_content)
file_manifest.append(os.path.relpath(fp, folder_md))
if file_word:
try: # 尝试导入依赖,如果缺少依赖,则给出安装建议
from docx import Document
except:
excption = "word_pip"
return False, final_result, page_one, file_manifest, excption
for index, fp in enumerate(word_manifest):
doc = Document(fp)
file_content = '\n'.join([p.text for p in doc.paragraphs])
file_content = file_content.encode('utf-8', 'ignore').decode()
page_one.append(file_content[:200])
final_result.append(file_content)
file_manifest.append(os.path.relpath(fp, folder_word))
return True, final_result, page_one, file_manifest, excption

查看文件

@@ -0,0 +1,73 @@
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>GPT-Academic 翻译报告书</title>
<style>
.centered-a {
color: red;
text-align: center;
margin-bottom: 2%;
font-size: 1.5em;
}
.centered-b {
color: red;
text-align: center;
margin-top: 10%;
margin-bottom: 20%;
font-size: 1.5em;
}
.centered-c {
color: rgba(255, 0, 0, 0);
text-align: center;
margin-top: 2%;
margin-bottom: 20%;
font-size: 7em;
}
</style>
<script>
// Configure MathJax settings
MathJax = {
tex: {
inlineMath: [
['$', '$'],
['\(', '\)']
]
}
}
addEventListener('zero-md-rendered', () => {MathJax.typeset(); console.log('MathJax typeset!');})
</script>
<!-- Load MathJax library -->
<script src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml.js"></script>
<script
type="module"
src="https://cdn.jsdelivr.net/gh/zerodevx/zero-md@2/dist/zero-md.min.js"
></script>
</head>
<body>
<div class="test_temp1" style="width:10%; height: 500px; float:left;">
</div>
<div class="test_temp2" style="width:80%; height: 500px; float:left;">
<!-- Simply set the `src` attribute to your MD file and win -->
<div class="centered-a">
请按Ctrl+S保存此页面,否则该页面可能在几分钟后失效。
</div>
<zero-md src="translated_markdown.md" no-shadow>
</zero-md>
<div class="centered-b">
本报告由GPT-Academic开源项目生成,地址https://github.com/binary-husky/gpt_academic。
</div>
<div class="centered-c">
本报告由GPT-Academic开源项目生成,地址https://github.com/binary-husky/gpt_academic。
</div>
</div>
<div class="test_temp3" style="width:10%; height: 500px; float:left;">
</div>
</body>
</html>

查看文件

@@ -0,0 +1,52 @@
import os, json, base64
from pydantic import BaseModel, Field
from textwrap import dedent
from typing import List
class ArgProperty(BaseModel): # PLUGIN_ARG_MENU
title: str = Field(description="The title", default="")
description: str = Field(description="The description", default="")
default_value: str = Field(description="The default value", default="")
type: str = Field(description="The type", default="") # currently we support ['string', 'dropdown']
options: List[str] = Field(default=[], description="List of options available for the argument") # only used when type is 'dropdown'
class GptAcademicPluginTemplate():
def __init__(self):
# please note that `execute` method may run in different threads,
# thus you should not store any state in the plugin instance,
# which may be accessed by multiple threads
pass
def define_arg_selection_menu(self):
"""
An example as below:
```
def define_arg_selection_menu(self):
gui_definition = {
"main_input":
ArgProperty(title="main input", description="description", default_value="default_value", type="string").model_dump_json(),
"advanced_arg":
ArgProperty(title="advanced arguments", description="description", default_value="default_value", type="string").model_dump_json(),
"additional_arg_01":
ArgProperty(title="additional", description="description", default_value="default_value", type="string").model_dump_json(),
}
return gui_definition
```
"""
raise NotImplementedError("You need to implement this method in your plugin class")
def get_js_code_for_generating_menu(self, btnName):
define_arg_selection = self.define_arg_selection_menu()
if len(define_arg_selection.keys()) > 8:
raise ValueError("You can only have up to 8 arguments in the define_arg_selection")
# if "main_input" not in define_arg_selection:
# raise ValueError("You must have a 'main_input' in the define_arg_selection")
DEFINE_ARG_INPUT_INTERFACE = json.dumps(define_arg_selection)
return base64.b64encode(DEFINE_ARG_INPUT_INTERFACE.encode('utf-8')).decode('utf-8')
def execute(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
raise NotImplementedError("You need to implement this method in your plugin class")

查看文件

@@ -0,0 +1,87 @@
SearchOptimizerPrompt="""作为一个网页搜索助手,你的任务是结合历史记录,从不同角度,为“原问题”生成个不同版本的“检索词”,从而提高网页检索的精度。生成的问题要求指向对象清晰明确,并与“原问题语言相同”。例如:
历史记录:
"
Q: 对话背景。
A: 当前对话是关于 Nginx 的介绍和在Ubuntu上的使用等。
"
原问题: 怎么下载
检索词: ["Nginx 下载","Ubuntu Nginx","Ubuntu安装Nginx"]
----------------
历史记录:
"
Q: 对话背景。
A: 当前对话是关于 Nginx 的介绍和使用等。
Q: 报错 "no connection"
A: 报错"no connection"可能是因为……
"
原问题: 怎么解决
检索词: ["Nginx报错"no connection" 解决","Nginx'no connection'报错 原因","Nginx提示'no connection'"]
----------------
历史记录:
"
"
原问题: 你知道 Python 么?
检索词: ["Python","Python 使用教程。","Python 特点和优势"]
----------------
历史记录:
"
Q: 列出Java的三种特点?
A: 1. Java 是一种编译型语言。
2. Java 是一种面向对象的编程语言。
3. Java 是一种跨平台的编程语言。
"
原问题: 介绍下第2点。
检索词: ["Java 面向对象特点","Java 面向对象编程优势。","Java 面向对象编程"]
----------------
现在有历史记录:
"
{history}
"
有其原问题: {query}
直接给出最多{num}个检索词,必须以json形式给出,不得有多余字符:
"""
SearchAcademicOptimizerPrompt="""作为一个学术论文搜索助手,你的任务是结合历史记录,从不同角度,为“原问题”生成个不同版本的“检索词”,从而提高学术论文检索的精度。生成的问题要求指向对象清晰明确,并与“原问题语言相同”。例如:
历史记录:
"
Q: 对话背景。
A: 当前对话是关于深度学习的介绍和在图像识别中的应用等。
"
原问题: 怎么下载相关论文
检索词: ["深度学习 图像识别 论文下载","图像识别 深度学习 研究论文","深度学习 图像识别 论文资源","Deep Learning Image Recognition Paper Download","Image Recognition Deep Learning Research Paper"]
----------------
历史记录:
"
Q: 对话背景。
A: 当前对话是关于深度学习的介绍和应用等。
Q: 报错 "模型不收敛"
A: 报错"模型不收敛"可能是因为……
"
原问题: 怎么解决
检索词: ["深度学习 模型不收敛 解决方案 论文","深度学习 模型不收敛 原因 研究","深度学习 模型不收敛 论文","Deep Learning Model Convergence Issue Solution Paper","Deep Learning Model Convergence Problem Research"]
----------------
历史记录:
"
"
原问题: 你知道 GAN 么?
检索词: ["生成对抗网络 论文","GAN 使用教程 论文","GAN 特点和优势 研究","Generative Adversarial Network Paper","GAN Usage Tutorial Paper"]
----------------
历史记录:
"
Q: 列出机器学习的三种应用?
A: 1. 机器学习在图像识别中的应用。
2. 机器学习在自然语言处理中的应用。
3. 机器学习在推荐系统中的应用。
"
原问题: 介绍下第2点。
检索词: ["机器学习 自然语言处理 应用 论文","机器学习 自然语言处理 研究","机器学习 NLP 应用 论文","Machine Learning Natural Language Processing Application Paper","Machine Learning NLP Research"]
----------------
现在有历史记录:
"
{history}
"
有其原问题: {query}
直接给出最多{num}个检索词,必须以json形式给出,不得有多余字符:
"""

查看文件

@@ -0,0 +1,130 @@
import llama_index
import os
import atexit
from loguru import logger
from typing import List
from llama_index.core import Document
from llama_index.core.schema import TextNode
from request_llms.embed_models.openai_embed import OpenAiEmbeddingModel
from shared_utils.connect_void_terminal import get_chat_default_kwargs
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from crazy_functions.rag_fns.vector_store_index import GptacVectorStoreIndex
from llama_index.core.ingestion import run_transformations
from llama_index.core import PromptTemplate
from llama_index.core.response_synthesizers import TreeSummarize
DEFAULT_QUERY_GENERATION_PROMPT = """\
Now, you have context information as below:
---------------------
{context_str}
---------------------
Answer the user request below (use the context information if necessary, otherwise you can ignore them):
---------------------
{query_str}
"""
QUESTION_ANSWER_RECORD = """\
{{
"type": "This is a previous conversation with the user",
"question": "{question}",
"answer": "{answer}",
}}
"""
class SaveLoad():
def does_checkpoint_exist(self, checkpoint_dir=None):
import os, glob
if checkpoint_dir is None: checkpoint_dir = self.checkpoint_dir
if not os.path.exists(checkpoint_dir): return False
if len(glob.glob(os.path.join(checkpoint_dir, "*.json"))) == 0: return False
return True
def save_to_checkpoint(self, checkpoint_dir=None):
logger.info(f'saving vector store to: {checkpoint_dir}')
if checkpoint_dir is None: checkpoint_dir = self.checkpoint_dir
self.vs_index.storage_context.persist(persist_dir=checkpoint_dir)
def load_from_checkpoint(self, checkpoint_dir=None):
if checkpoint_dir is None: checkpoint_dir = self.checkpoint_dir
if self.does_checkpoint_exist(checkpoint_dir=checkpoint_dir):
logger.info('loading checkpoint from disk')
from llama_index.core import StorageContext, load_index_from_storage
storage_context = StorageContext.from_defaults(persist_dir=checkpoint_dir)
self.vs_index = load_index_from_storage(storage_context, embed_model=self.embed_model)
return self.vs_index
else:
return self.create_new_vs()
def create_new_vs(self):
return GptacVectorStoreIndex.default_vector_store(embed_model=self.embed_model)
def purge(self):
import shutil
shutil.rmtree(self.checkpoint_dir, ignore_errors=True)
self.vs_index = self.create_new_vs()
class LlamaIndexRagWorker(SaveLoad):
def __init__(self, user_name, llm_kwargs, auto_load_checkpoint=True, checkpoint_dir=None) -> None:
self.debug_mode = True
self.embed_model = OpenAiEmbeddingModel(llm_kwargs)
self.user_name = user_name
self.checkpoint_dir = checkpoint_dir
if auto_load_checkpoint:
self.vs_index = self.load_from_checkpoint(checkpoint_dir)
else:
self.vs_index = self.create_new_vs(checkpoint_dir)
atexit.register(lambda: self.save_to_checkpoint(checkpoint_dir))
def assign_embedding_model(self):
pass
def inspect_vector_store(self):
# This function is for debugging
self.vs_index.storage_context.index_store.to_dict()
docstore = self.vs_index.storage_context.docstore.docs
vector_store_preview = "\n".join([ f"{_id} | {tn.text}" for _id, tn in docstore.items() ])
logger.info('\n++ --------inspect_vector_store begin--------')
logger.info(vector_store_preview)
logger.info('oo --------inspect_vector_store end--------')
return vector_store_preview
def add_documents_to_vector_store(self, document_list):
documents = [Document(text=t) for t in document_list]
documents_nodes = run_transformations(
documents, # type: ignore
self.vs_index._transformations,
show_progress=True
)
self.vs_index.insert_nodes(documents_nodes)
if self.debug_mode: self.inspect_vector_store()
def add_text_to_vector_store(self, text):
node = TextNode(text=text)
documents_nodes = run_transformations(
[node],
self.vs_index._transformations,
show_progress=True
)
self.vs_index.insert_nodes(documents_nodes)
if self.debug_mode: self.inspect_vector_store()
def remember_qa(self, question, answer):
formatted_str = QUESTION_ANSWER_RECORD.format(question=question, answer=answer)
self.add_text_to_vector_store(formatted_str)
def retrieve_from_store_with_query(self, query):
if self.debug_mode: self.inspect_vector_store()
retriever = self.vs_index.as_retriever()
return retriever.retrieve(query)
def build_prompt(self, query, nodes):
context_str = self.generate_node_array_preview(nodes)
return DEFAULT_QUERY_GENERATION_PROMPT.format(context_str=context_str, query_str=query)
def generate_node_array_preview(self, nodes):
buf = "\n".join(([f"(No.{i+1} | score {n.score:.3f}): {n.text}" for i, n in enumerate(nodes)]))
if self.debug_mode: logger.info(buf)
return buf

查看文件

@@ -0,0 +1,108 @@
import llama_index
import os
import atexit
from typing import List
from loguru import logger
from llama_index.core import Document
from llama_index.core.schema import TextNode
from request_llms.embed_models.openai_embed import OpenAiEmbeddingModel
from shared_utils.connect_void_terminal import get_chat_default_kwargs
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from crazy_functions.rag_fns.vector_store_index import GptacVectorStoreIndex
from llama_index.core.ingestion import run_transformations
from llama_index.core import PromptTemplate
from llama_index.core.response_synthesizers import TreeSummarize
from llama_index.core import StorageContext
from llama_index.vector_stores.milvus import MilvusVectorStore
from crazy_functions.rag_fns.llama_index_worker import LlamaIndexRagWorker
DEFAULT_QUERY_GENERATION_PROMPT = """\
Now, you have context information as below:
---------------------
{context_str}
---------------------
Answer the user request below (use the context information if necessary, otherwise you can ignore them):
---------------------
{query_str}
"""
QUESTION_ANSWER_RECORD = """\
{{
"type": "This is a previous conversation with the user",
"question": "{question}",
"answer": "{answer}",
}}
"""
class MilvusSaveLoad():
def does_checkpoint_exist(self, checkpoint_dir=None):
import os, glob
if checkpoint_dir is None: checkpoint_dir = self.checkpoint_dir
if not os.path.exists(checkpoint_dir): return False
if len(glob.glob(os.path.join(checkpoint_dir, "*.json"))) == 0: return False
return True
def save_to_checkpoint(self, checkpoint_dir=None):
logger.info(f'saving vector store to: {checkpoint_dir}')
# if checkpoint_dir is None: checkpoint_dir = self.checkpoint_dir
# self.vs_index.storage_context.persist(persist_dir=checkpoint_dir)
def load_from_checkpoint(self, checkpoint_dir=None):
if checkpoint_dir is None: checkpoint_dir = self.checkpoint_dir
if self.does_checkpoint_exist(checkpoint_dir=checkpoint_dir):
logger.info('loading checkpoint from disk')
from llama_index.core import StorageContext, load_index_from_storage
storage_context = StorageContext.from_defaults(persist_dir=checkpoint_dir)
try:
self.vs_index = load_index_from_storage(storage_context, embed_model=self.embed_model)
return self.vs_index
except:
return self.create_new_vs(checkpoint_dir)
else:
return self.create_new_vs(checkpoint_dir)
def create_new_vs(self, checkpoint_dir, overwrite=False):
vector_store = MilvusVectorStore(
uri=os.path.join(checkpoint_dir, "milvus_demo.db"),
dim=self.embed_model.embedding_dimension(),
overwrite=overwrite
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = GptacVectorStoreIndex.default_vector_store(storage_context=storage_context, embed_model=self.embed_model)
return index
def purge(self):
self.vs_index = self.create_new_vs(self.checkpoint_dir, overwrite=True)
class MilvusRagWorker(MilvusSaveLoad, LlamaIndexRagWorker):
def __init__(self, user_name, llm_kwargs, auto_load_checkpoint=True, checkpoint_dir=None) -> None:
self.debug_mode = True
self.embed_model = OpenAiEmbeddingModel(llm_kwargs)
self.user_name = user_name
self.checkpoint_dir = checkpoint_dir
if auto_load_checkpoint:
self.vs_index = self.load_from_checkpoint(checkpoint_dir)
else:
self.vs_index = self.create_new_vs(checkpoint_dir)
atexit.register(lambda: self.save_to_checkpoint(checkpoint_dir))
def inspect_vector_store(self):
# This function is for debugging
try:
self.vs_index.storage_context.index_store.to_dict()
docstore = self.vs_index.storage_context.docstore.docs
if not docstore.items():
raise ValueError("cannot inspect")
vector_store_preview = "\n".join([ f"{_id} | {tn.text}" for _id, tn in docstore.items() ])
except:
dummy_retrieve_res: List["NodeWithScore"] = self.vs_index.as_retriever().retrieve(' ')
vector_store_preview = "\n".join(
[f"{node.id_} | {node.text}" for node in dummy_retrieve_res]
)
logger.info('\n++ --------inspect_vector_store begin--------')
logger.info(vector_store_preview)
logger.info('oo --------inspect_vector_store end--------')
return vector_store_preview

查看文件

@@ -0,0 +1,58 @@
from llama_index.core import VectorStoreIndex
from typing import Any, List, Optional
from llama_index.core.callbacks.base import CallbackManager
from llama_index.core.schema import TransformComponent
from llama_index.core.service_context import ServiceContext
from llama_index.core.settings import (
Settings,
callback_manager_from_settings_or_context,
transformations_from_settings_or_context,
)
from llama_index.core.storage.storage_context import StorageContext
class GptacVectorStoreIndex(VectorStoreIndex):
@classmethod
def default_vector_store(
cls,
storage_context: Optional[StorageContext] = None,
show_progress: bool = False,
callback_manager: Optional[CallbackManager] = None,
transformations: Optional[List[TransformComponent]] = None,
# deprecated
service_context: Optional[ServiceContext] = None,
embed_model = None,
**kwargs: Any,
):
"""Create index from documents.
Args:
documents (Optional[Sequence[BaseDocument]]): List of documents to
build the index from.
"""
storage_context = storage_context or StorageContext.from_defaults()
docstore = storage_context.docstore
callback_manager = (
callback_manager
or callback_manager_from_settings_or_context(Settings, service_context)
)
transformations = transformations or transformations_from_settings_or_context(
Settings, service_context
)
with callback_manager.as_trace("index_construction"):
return cls(
nodes=[],
storage_context=storage_context,
callback_manager=callback_manager,
show_progress=show_progress,
transformations=transformations,
service_context=service_context,
embed_model=embed_model,
**kwargs,
)

查看文件

@@ -1,16 +1,17 @@
# From project chatglm-langchain
import threading
from toolbox import Singleton
import os
import shutil
import os
import uuid
import tqdm
import shutil
import threading
import numpy as np
from toolbox import Singleton
from loguru import logger
from langchain.vectorstores import FAISS
from langchain.docstore.document import Document
from typing import List, Tuple
import numpy as np
from crazy_functions.vector_fns.general_file_loader import load_file
embedding_model_dict = {
@@ -150,17 +151,17 @@ class LocalDocQA:
failed_files = []
if isinstance(filepath, str):
if not os.path.exists(filepath):
print("路径不存在")
logger.error("路径不存在")
return None
elif os.path.isfile(filepath):
file = os.path.split(filepath)[-1]
try:
docs = load_file(filepath, SENTENCE_SIZE)
print(f"{file} 已成功加载")
logger.info(f"{file} 已成功加载")
loaded_files.append(filepath)
except Exception as e:
print(e)
print(f"{file} 未能成功加载")
logger.error(e)
logger.error(f"{file} 未能成功加载")
return None
elif os.path.isdir(filepath):
docs = []
@@ -170,23 +171,23 @@ class LocalDocQA:
docs += load_file(fullfilepath, SENTENCE_SIZE)
loaded_files.append(fullfilepath)
except Exception as e:
print(e)
logger.error(e)
failed_files.append(file)
if len(failed_files) > 0:
print("以下文件未能成功加载:")
logger.error("以下文件未能成功加载:")
for file in failed_files:
print(f"{file}\n")
logger.error(f"{file}\n")
else:
docs = []
for file in filepath:
docs += load_file(file, SENTENCE_SIZE)
print(f"{file} 已成功加载")
logger.info(f"{file} 已成功加载")
loaded_files.append(file)
if len(docs) > 0:
print("文件加载完毕,正在生成向量库")
logger.info("文件加载完毕,正在生成向量库")
if vs_path and os.path.isdir(vs_path):
try:
self.vector_store = FAISS.load_local(vs_path, text2vec)
@@ -233,7 +234,7 @@ class LocalDocQA:
prompt += "\n\n".join([f"({k}): " + doc.page_content for k, doc in enumerate(related_docs_with_score)])
prompt += "\n\n---\n\n"
prompt = prompt.encode('utf-8', 'ignore').decode() # avoid reading non-utf8 chars
# print(prompt)
# logger.info(prompt)
response = {"query": query, "source_documents": related_docs_with_score}
return response, prompt
@@ -262,7 +263,7 @@ def construct_vector_store(vs_id, vs_path, files, sentence_size, history, one_co
else:
pass
# file_status = "文件未成功加载,请重新上传文件"
# print(file_status)
# logger.info(file_status)
return local_doc_qa, vs_path
@Singleton
@@ -278,7 +279,7 @@ class knowledge_archive_interface():
if self.text2vec_large_chinese is None:
# < -------------------预热文本向量化模组--------------- >
from toolbox import ProxyNetworkActivate
print('Checking Text2vec ...')
logger.info('Checking Text2vec ...')
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
with ProxyNetworkActivate('Download_LLM'): # 临时地激活代理网络
self.text2vec_large_chinese = HuggingFaceEmbeddings(model_name="GanymedeNil/text2vec-large-chinese")

查看文件

@@ -10,7 +10,7 @@ def read_avail_plugin_enum():
from crazy_functional import get_crazy_functions
plugin_arr = get_crazy_functions()
# remove plugins with out explaination
plugin_arr = {k:v for k, v in plugin_arr.items() if 'Info' in v}
plugin_arr = {k:v for k, v in plugin_arr.items() if ('Info' in v) and ('Function' in v)}
plugin_arr_info = {"F_{:04d}".format(i):v["Info"] for i, v in enumerate(plugin_arr.values(), start=1)}
plugin_arr_dict = {"F_{:04d}".format(i):v for i, v in enumerate(plugin_arr.values(), start=1)}
plugin_arr_dict_parse = {"F_{:04d}".format(i):v for i, v in enumerate(plugin_arr.values(), start=1)}

查看文件

@@ -1,17 +1,19 @@
import re, requests, unicodedata, os
from toolbox import update_ui, get_log_folder
from toolbox import write_history_to_file, promote_file_to_downloadzone
from toolbox import CatchException, report_exception, get_conf
import re, requests, unicodedata, os
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from loguru import logger
def download_arxiv_(url_pdf):
if 'arxiv.org' not in url_pdf:
if ('.' in url_pdf) and ('/' not in url_pdf):
new_url = 'https://arxiv.org/abs/'+url_pdf
print('下载编号:', url_pdf, '自动定位:', new_url)
logger.info('下载编号:', url_pdf, '自动定位:', new_url)
# download_arxiv_(new_url)
return download_arxiv_(new_url)
else:
print('不能识别的URL')
logger.info('不能识别的URL')
return None
if 'abs' in url_pdf:
url_pdf = url_pdf.replace('abs', 'pdf')
@@ -42,15 +44,12 @@ def download_arxiv_(url_pdf):
requests_pdf_url = url_pdf
file_path = download_dir+title_str
print('下载中')
logger.info('下载中')
proxies = get_conf('proxies')
r = requests.get(requests_pdf_url, proxies=proxies)
with open(file_path, 'wb+') as f:
f.write(r.content)
print('下载完成')
# print('输出下载命令:','aria2c -o \"%s\" %s'%(title_str,url_pdf))
# subprocess.call('aria2c --all-proxy=\"172.18.116.150:11084\" -o \"%s\" %s'%(download_dir+title_str,url_pdf), shell=True)
logger.info('下载完成')
x = "%s %s %s.bib" % (paper_id, other_info['year'], other_info['authors'])
x = x.replace('?', '')\
@@ -63,19 +62,9 @@ def download_arxiv_(url_pdf):
def get_name(_url_):
import os
from bs4 import BeautifulSoup
print('正在获取文献名!')
print(_url_)
# arxiv_recall = {}
# if os.path.exists('./arxiv_recall.pkl'):
# with open('./arxiv_recall.pkl', 'rb') as f:
# arxiv_recall = pickle.load(f)
# if _url_ in arxiv_recall:
# print('在缓存中')
# return arxiv_recall[_url_]
logger.info('正在获取文献名!')
logger.info(_url_)
proxies = get_conf('proxies')
res = requests.get(_url_, proxies=proxies)
@@ -92,7 +81,7 @@ def get_name(_url_):
other_details['abstract'] = abstract
except:
other_details['year'] = ''
print('年份获取失败')
logger.info('年份获取失败')
# get author
try:
@@ -101,7 +90,7 @@ def get_name(_url_):
other_details['authors'] = authors
except:
other_details['authors'] = ''
print('authors获取失败')
logger.info('authors获取失败')
# get comment
try:
@@ -116,11 +105,11 @@ def get_name(_url_):
other_details['comment'] = ''
except:
other_details['comment'] = ''
print('年份获取失败')
logger.info('年份获取失败')
title_str = BeautifulSoup(
res.text, 'html.parser').find('title').contents[0]
print('获取成功:', title_str)
logger.info('获取成功:', title_str)
# arxiv_recall[_url_] = (title_str+'.pdf', other_details)
# with open('./arxiv_recall.pkl', 'wb') as f:
# pickle.dump(arxiv_recall, f)

查看文件

@@ -1,6 +1,5 @@
from toolbox import CatchException, update_ui
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
@CatchException
def 交互功能模板函数(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):

查看文件

@@ -16,8 +16,8 @@ Testing:
from toolbox import CatchException, update_ui, gen_time_str, trimmed_format_exc, is_the_upload_folder
from toolbox import promote_file_to_downloadzone, get_log_folder, update_ui_lastest_msg
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive, get_plugin_arg
from .crazy_utils import input_clipping, try_install_deps
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive, get_plugin_arg
from crazy_functions.crazy_utils import input_clipping, try_install_deps
from crazy_functions.gen_fns.gen_fns_shared import is_function_successfully_generated
from crazy_functions.gen_fns.gen_fns_shared import get_class_name
from crazy_functions.gen_fns.gen_fns_shared import subprocess_worker

查看文件

@@ -1,6 +1,6 @@
from toolbox import CatchException, update_ui, gen_time_str
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from .crazy_utils import input_clipping
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from crazy_functions.crazy_utils import input_clipping
import copy, json
@CatchException

查看文件

@@ -6,13 +6,14 @@
"""
import time
from toolbox import CatchException, update_ui, gen_time_str, trimmed_format_exc, ProxyNetworkActivate
from toolbox import get_conf, select_api_key, update_ui_lastest_msg, Singleton
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive, get_plugin_arg
from crazy_functions.crazy_utils import input_clipping, try_install_deps
from crazy_functions.agent_fns.persistent import GradioMultiuserManagerForPersistentClasses
from crazy_functions.agent_fns.auto_agent import AutoGenMath
import time
from loguru import logger
def remove_model_prefix(llm):
if llm.startswith('api2d-'): llm = llm.replace('api2d-', '')
@@ -80,12 +81,12 @@ def 多智能体终端(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_
persistent_key = f"{user_uuid}->多智能体终端"
if persistent_class_multi_user_manager.already_alive(persistent_key):
# 当已经存在一个正在运行的多智能体终端时,直接将用户输入传递给它,而不是再次启动一个新的多智能体终端
print('[debug] feed new user input')
logger.info('[debug] feed new user input')
executor = persistent_class_multi_user_manager.get(persistent_key)
exit_reason = yield from executor.main_process_ui_control(txt, create_or_resume="resume")
else:
# 运行多智能体终端 (首次)
print('[debug] create new executor instance')
logger.info('[debug] create new executor instance')
history = []
chatbot.append(["正在启动: 多智能体终端", "插件动态生成, 执行开始, 作者 Microsoft & Binary-Husky."])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面

查看文件

@@ -1,7 +1,7 @@
from toolbox import update_ui
from toolbox import CatchException, report_exception
from toolbox import write_history_to_file, promote_file_to_downloadzone
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
fast_debug = False

查看文件

@@ -1,5 +1,5 @@
from toolbox import CatchException, report_exception, select_api_key, update_ui, get_conf
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from toolbox import write_history_to_file, promote_file_to_downloadzone, get_log_folder
def split_audio_file(filename, split_duration=1000):

查看文件

@@ -1,16 +1,18 @@
from loguru import logger
from toolbox import update_ui, promote_file_to_downloadzone, gen_time_str
from toolbox import CatchException, report_exception
from toolbox import write_history_to_file, promote_file_to_downloadzone
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from .crazy_utils import read_and_clean_pdf_text
from .crazy_utils import input_clipping
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from crazy_functions.crazy_utils import read_and_clean_pdf_text
from crazy_functions.crazy_utils import input_clipping
def 解析PDF(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
file_write_buffer = []
for file_name in file_manifest:
print('begin analysis on:', file_name)
logger.info('begin analysis on:', file_name)
############################## <第 0 步,切割PDF> ##################################
# 递归地切割PDF文件,每一块尽量是完整的一个section,比如introduction,experiment等,必要时再进行切割
# 的长度必须小于 2500 个 Token
@@ -38,7 +40,7 @@ def 解析PDF(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot,
last_iteration_result = paper_meta # 初始值是摘要
MAX_WORD_TOTAL = 4096 * 0.7
n_fragment = len(paper_fragments)
if n_fragment >= 20: print('文章极长,不能达到预期效果')
if n_fragment >= 20: logger.warning('文章极长,不能达到预期效果')
for i in range(n_fragment):
NUM_OF_WORD = MAX_WORD_TOTAL // n_fragment
i_say = f"Read this section, recapitulate the content of this section with less than {NUM_OF_WORD} Chinese characters: {paper_fragments[i]}"

查看文件

@@ -1,6 +1,7 @@
from loguru import logger
from toolbox import update_ui
from toolbox import CatchException, report_exception
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from toolbox import write_history_to_file, promote_file_to_downloadzone
fast_debug = False
@@ -57,7 +58,6 @@ def readPdf(pdfPath):
layout = device.get_result()
for obj in layout._objs:
if isinstance(obj, pdfminer.layout.LTTextBoxHorizontal):
# print(obj.get_text())
outTextList.append(obj.get_text())
return outTextList
@@ -66,7 +66,7 @@ def readPdf(pdfPath):
def 解析Paper(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
import time, glob, os
from bs4 import BeautifulSoup
print('begin analysis on:', file_manifest)
logger.info('begin analysis on:', file_manifest)
for index, fp in enumerate(file_manifest):
if ".tex" in fp:
with open(fp, 'r', encoding='utf-8', errors='replace') as f:
@@ -77,7 +77,7 @@ def 解析Paper(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbo
prefix = "接下来请你逐文件分析下面的论文文件,概括其内容" if index==0 else ""
i_say = prefix + f'请对下面的文章片段用中文做一个概述,文件名是{os.path.relpath(fp, project_folder)},文章内容是 ```{file_content}```'
i_say_show_user = prefix + f'[{index}/{len(file_manifest)}] 请对下面的文章片段做一个概述: {os.path.abspath(fp)}'
i_say_show_user = prefix + f'[{index+1}/{len(file_manifest)}] 请对下面的文章片段做一个概述: {os.path.abspath(fp)}'
chatbot.append((i_say_show_user, "[Local Message] waiting gpt response."))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面

查看文件

@@ -1,11 +1,11 @@
from toolbox import CatchException, report_exception, get_log_folder, gen_time_str
from toolbox import update_ui, promote_file_to_downloadzone, update_ui_lastest_msg, disable_auto_promotion
from toolbox import write_history_to_file, promote_file_to_downloadzone
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
from .crazy_utils import read_and_clean_pdf_text
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from crazy_functions.crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
from crazy_functions.crazy_utils import read_and_clean_pdf_text
from .pdf_fns.parse_pdf import parse_pdf, get_avail_grobid_url, translate_pdf
from colorful import *
from shared_utils.colorful import *
import copy
import os
import math
@@ -60,7 +60,7 @@ def 批量翻译PDF文档(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst
# 清空历史,以免输入溢出
history = []
from .crazy_utils import get_files_from_everything
from crazy_functions.crazy_utils import get_files_from_everything
success, file_manifest, project_folder = get_files_from_everything(txt, type='.pdf')
if len(file_manifest) > 0:
# 尝试导入依赖,如果缺少依赖,则给出安装建议

查看文件

@@ -1,4 +1,5 @@
import os
from loguru import logger
from toolbox import CatchException, update_ui, gen_time_str, promote_file_to_downloadzone
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from crazy_functions.crazy_utils import input_clipping
@@ -34,10 +35,10 @@ def eval_manim(code):
return f'gpt_log/{time_str}.mp4'
except subprocess.CalledProcessError as e:
output = e.output.decode()
print(f"Command returned non-zero exit status {e.returncode}: {output}.")
logger.error(f"Command returned non-zero exit status {e.returncode}: {output}.")
return f"Evaluating python script failed: {e.output}."
except:
print('generating mp4 failed')
logger.error('generating mp4 failed')
return "Generating mp4 failed."

查看文件

@@ -1,13 +1,12 @@
from loguru import logger
from toolbox import update_ui
from toolbox import CatchException, report_exception
from .crazy_utils import read_and_clean_pdf_text
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
fast_debug = False
from crazy_functions.crazy_utils import read_and_clean_pdf_text
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
def 解析PDF(file_name, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
import tiktoken
print('begin analysis on:', file_name)
logger.info('begin analysis on:', file_name)
############################## <第 0 步,切割PDF> ##################################
# 递归地切割PDF文件,每一块尽量是完整的一个section,比如introduction,experiment等,必要时再进行切割
@@ -36,7 +35,7 @@ def 解析PDF(file_name, llm_kwargs, plugin_kwargs, chatbot, history, system_pro
last_iteration_result = paper_meta # 初始值是摘要
MAX_WORD_TOTAL = 4096
n_fragment = len(paper_fragments)
if n_fragment >= 20: print('文章极长,不能达到预期效果')
if n_fragment >= 20: logger.warning('文章极长,不能达到预期效果')
for i in range(n_fragment):
NUM_OF_WORD = MAX_WORD_TOTAL // n_fragment
i_say = f"Read this section, recapitulate the content of this section with less than {NUM_OF_WORD} words: {paper_fragments[i]}"
@@ -57,7 +56,7 @@ def 解析PDF(file_name, llm_kwargs, plugin_kwargs, chatbot, history, system_pro
chatbot.append([i_say_show_user, gpt_say])
############################## <第 4 步,设置一个token上限,防止回答时Token溢出> ##################################
from .crazy_utils import input_clipping
from crazy_functions.crazy_utils import input_clipping
_, final_results = input_clipping("", final_results, max_token_limit=3200)
yield from update_ui(chatbot=chatbot, history=final_results) # 注意这里的历史记录被替代了

查看文件

@@ -1,22 +1,21 @@
from loguru import logger
from toolbox import update_ui
from toolbox import CatchException, report_exception
from toolbox import write_history_to_file, promote_file_to_downloadzone
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
fast_debug = False
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
def 生成函数注释(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
import time, os
print('begin analysis on:', file_manifest)
logger.info('begin analysis on:', file_manifest)
for index, fp in enumerate(file_manifest):
with open(fp, 'r', encoding='utf-8', errors='replace') as f:
file_content = f.read()
i_say = f'请对下面的程序文件做一个概述,并对文件中的所有函数生成注释,使用markdown表格输出结果,文件名是{os.path.relpath(fp, project_folder)},文件内容是 ```{file_content}```'
i_say_show_user = f'[{index}/{len(file_manifest)}] 请对下面的程序文件做一个概述,并对文件中的所有函数生成注释: {os.path.abspath(fp)}'
i_say_show_user = f'[{index+1}/{len(file_manifest)}] 请对下面的程序文件做一个概述,并对文件中的所有函数生成注释: {os.path.abspath(fp)}'
chatbot.append((i_say_show_user, "[Local Message] waiting gpt response."))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
if not fast_debug:
msg = '正常'
# ** gpt request **
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
@@ -25,9 +24,8 @@ def 生成函数注释(file_manifest, project_folder, llm_kwargs, plugin_kwargs,
chatbot[-1] = (i_say_show_user, gpt_say)
history.append(i_say_show_user); history.append(gpt_say)
yield from update_ui(chatbot=chatbot, history=history, msg=msg) # 刷新界面
if not fast_debug: time.sleep(2)
time.sleep(2)
if not fast_debug:
res = write_history_to_file(history)
promote_file_to_downloadzone(res, chatbot=chatbot)
chatbot.append(("完成了吗?", res))

查看文件

@@ -1,9 +1,11 @@
from toolbox import CatchException, update_ui, report_exception
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from .crazy_utils import read_and_clean_pdf_text
import datetime
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from crazy_functions.plugin_template.plugin_class_template import (
GptAcademicPluginTemplate,
)
from crazy_functions.plugin_template.plugin_class_template import ArgProperty
#以下是每类图表的PROMPT
# 以下是每类图表的PROMPT
SELECT_PROMPT = """
{subject}
=============
@@ -18,22 +20,24 @@ SELECT_PROMPT = """
8 象限提示图
不需要解释原因,仅需要输出单个不带任何标点符号的数字。
"""
#没有思维导图!!!测试发现模型始终会优先选择思维导图
#流程图
# 没有思维导图!!!测试发现模型始终会优先选择思维导图
# 流程图
PROMPT_1 = """
请你给出围绕“{subject}”的逻辑关系图,使用mermaid语法,mermaid语法举例
请你给出围绕“{subject}”的逻辑关系图,使用mermaid语法,注意需要使用双引号将内容括起来。
mermaid语法举例
```mermaid
graph TD
P(编程) --> L1(Python)
P(编程) --> L2(C)
P(编程) --> L3(C++)
P(编程) --> L4(Javascipt)
P(编程) --> L5(PHP)
P("编程") --> L1("Python")
P("编程") --> L2("C")
P("编程") --> L3("C++")
P("编程") --> L4("Javascipt")
P("编程") --> L5("PHP")
```
"""
#序列图
# 序列图
PROMPT_2 = """
请你给出围绕“{subject}”的序列图,使用mermaid语法,mermaid语法举例
请你给出围绕“{subject}”的序列图,使用mermaid语法
mermaid语法举例
```mermaid
sequenceDiagram
participant A as 用户
@@ -44,9 +48,10 @@ sequenceDiagram
B->>A: 返回数据
```
"""
#类图
# 类图
PROMPT_3 = """
请你给出围绕“{subject}”的类图,使用mermaid语法,mermaid语法举例
请你给出围绕“{subject}”的类图,使用mermaid语法
mermaid语法举例
```mermaid
classDiagram
Class01 <|-- AveryLongClass : Cool
@@ -64,9 +69,10 @@ classDiagram
Class08 <--> C2: Cool label
```
"""
#饼图
# 饼图
PROMPT_4 = """
请你给出围绕“{subject}”的饼图,使用mermaid语法,mermaid语法举例
请你给出围绕“{subject}”的饼图,使用mermaid语法,注意需要使用双引号将内容括起来。
mermaid语法举例
```mermaid
pie title Pets adopted by volunteers
"" : 386
@@ -74,38 +80,41 @@ pie title Pets adopted by volunteers
"兔子" : 15
```
"""
#甘特图
# 甘特图
PROMPT_5 = """
请你给出围绕“{subject}”的甘特图,使用mermaid语法,mermaid语法举例
请你给出围绕“{subject}”的甘特图,使用mermaid语法,注意需要使用双引号将内容括起来。
mermaid语法举例
```mermaid
gantt
title 项目开发流程
title "项目开发流程"
dateFormat YYYY-MM-DD
section 设计
需求分析 :done, des1, 2024-01-06,2024-01-08
原型设计 :active, des2, 2024-01-09, 3d
UI设计 : des3, after des2, 5d
section 开发
前端开发 :2024-01-20, 10d
后端开发 :2024-01-20, 10d
section "设计"
"需求分析" :done, des1, 2024-01-06,2024-01-08
"原型设计" :active, des2, 2024-01-09, 3d
"UI设计" : des3, after des2, 5d
section "开发"
"前端开发" :2024-01-20, 10d
"后端开发" :2024-01-20, 10d
```
"""
#状态图
# 状态图
PROMPT_6 = """
请你给出围绕“{subject}”的状态图,使用mermaid语法,mermaid语法举例
请你给出围绕“{subject}”的状态图,使用mermaid语法,注意需要使用双引号将内容括起来。
mermaid语法举例
```mermaid
stateDiagram-v2
[*] --> Still
Still --> [*]
Still --> Moving
Moving --> Still
Moving --> Crash
Crash --> [*]
[*] --> "Still"
"Still" --> [*]
"Still" --> "Moving"
"Moving" --> "Still"
"Moving" --> "Crash"
"Crash" --> [*]
```
"""
#实体关系图
# 实体关系图
PROMPT_7 = """
请你给出围绕“{subject}”的实体关系图,使用mermaid语法,mermaid语法举例
请你给出围绕“{subject}”的实体关系图,使用mermaid语法
mermaid语法举例
```mermaid
erDiagram
CUSTOMER ||--o{ ORDER : places
@@ -125,144 +134,172 @@ erDiagram
}
```
"""
#象限提示图
# 象限提示图
PROMPT_8 = """
请你给出围绕“{subject}”的象限图,使用mermaid语法,mermaid语法举例
请你给出围绕“{subject}”的象限图,使用mermaid语法,注意需要使用双引号将内容括起来。
mermaid语法举例
```mermaid
graph LR
A[Hard skill] --> B(Programming)
A[Hard skill] --> C(Design)
D[Soft skill] --> E(Coordination)
D[Soft skill] --> F(Communication)
A["Hard skill"] --> B("Programming")
A["Hard skill"] --> C("Design")
D["Soft skill"] --> E("Coordination")
D["Soft skill"] --> F("Communication")
```
"""
#思维导图
# 思维导图
PROMPT_9 = """
{subject}
==========
请给出上方内容的思维导图,充分考虑其之间的逻辑,使用mermaid语法,mermaid语法举例
请给出上方内容的思维导图,充分考虑其之间的逻辑,使用mermaid语法,注意需要使用双引号将内容括起来。
mermaid语法举例
```mermaid
mindmap
root((mindmap))
Origins
Long history
("Origins")
("Long history")
::icon(fa fa-book)
Popularisation
British popular psychology author Tony Buzan
Research
On effectiveness<br/>and features
On Automatic creation
Uses
Creative techniques
Strategic planning
Argument mapping
Tools
Pen and paper
Mermaid
("Popularisation")
("British popular psychology author Tony Buzan")
::icon(fa fa-user)
("Research")
("On effectiveness<br/>and features")
::icon(fa fa-search)
("On Automatic creation")
::icon(fa fa-robot)
("Uses")
("Creative techniques")
::icon(fa fa-lightbulb-o)
("Strategic planning")
::icon(fa fa-flag)
("Argument mapping")
::icon(fa fa-comments)
("Tools")
("Pen and paper")
::icon(fa fa-pencil)
("Mermaid")
::icon(fa fa-code)
```
"""
def 解析历史输入(history,llm_kwargs,chatbot,plugin_kwargs):
def 解析历史输入(history, llm_kwargs, file_manifest, chatbot, plugin_kwargs):
############################## <第 0 步,切割输入> ##################################
# 借用PDF切割中的函数对文本进行切割
TOKEN_LIMIT_PER_FRAGMENT = 2500
txt = str(history).encode('utf-8', 'ignore').decode() # avoid reading non-utf8 chars
from crazy_functions.pdf_fns.breakdown_txt import breakdown_text_to_satisfy_token_limit
txt = breakdown_text_to_satisfy_token_limit(txt=txt, limit=TOKEN_LIMIT_PER_FRAGMENT, llm_model=llm_kwargs['llm_model'])
txt = (
str(history).encode("utf-8", "ignore").decode()
) # avoid reading non-utf8 chars
from crazy_functions.pdf_fns.breakdown_txt import (
breakdown_text_to_satisfy_token_limit,
)
txt = breakdown_text_to_satisfy_token_limit(
txt=txt, limit=TOKEN_LIMIT_PER_FRAGMENT, llm_model=llm_kwargs["llm_model"]
)
############################## <第 1 步,迭代地历遍整个文章,提取精炼信息> ##################################
i_say_show_user = f'首先你从历史记录或文件中提取摘要。'; gpt_say = "[Local Message] 收到。" # 用户提示
chatbot.append([i_say_show_user, gpt_say]); yield from update_ui(chatbot=chatbot, history=history) # 更新UI
results = []
MAX_WORD_TOTAL = 4096
n_txt = len(txt)
last_iteration_result = "从以下文本中提取摘要。"
if n_txt >= 20: print('文章极长,不能达到预期效果')
for i in range(n_txt):
NUM_OF_WORD = MAX_WORD_TOTAL // n_txt
i_say = f"Read this section, recapitulate the content of this section with less than {NUM_OF_WORD} words: {txt[i]}"
i_say = f"Read this section, recapitulate the content of this section with less than {NUM_OF_WORD} words in Chinese: {txt[i]}"
i_say_show_user = f"[{i+1}/{n_txt}] Read this section, recapitulate the content of this section with less than {NUM_OF_WORD} words: {txt[i][:200]} ...."
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(i_say, i_say_show_user, # i_say=真正给chatgpt的提问, i_say_show_user=给用户看的提问
llm_kwargs, chatbot,
history=["The main content of the previous section is?", last_iteration_result], # 迭代上一次的结果
sys_prompt="Extracts the main content from the text section where it is located for graphing purposes, answer me with Chinese." # 提示
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
i_say,
i_say_show_user, # i_say=真正给chatgpt的提问, i_say_show_user=给用户看的提问
llm_kwargs,
chatbot,
history=[
"The main content of the previous section is?",
last_iteration_result,
], # 迭代上一次的结果
sys_prompt="Extracts the main content from the text section where it is located for graphing purposes, answer me with Chinese.", # 提示
)
results.append(gpt_say)
last_iteration_result = gpt_say
############################## <第 2 步,根据整理的摘要选择图表类型> ##################################
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
gpt_say = plugin_kwargs.get("advanced_arg", "") #将图表类型参数赋值为插件参数
results_txt = '\n'.join(results) #合并摘要
if gpt_say not in ['1','2','3','4','5','6','7','8','9']: #如插件参数不正确则使用对话模型判断
i_say_show_user = f'接下来将判断适合的图表类型,如连续3次判断失败将会使用流程图进行绘制'; gpt_say = "[Local Message] 收到。" # 用户提示
chatbot.append([i_say_show_user, gpt_say]); yield from update_ui(chatbot=chatbot, history=[]) # 更新UI
gpt_say = str(plugin_kwargs) # 将图表类型参数赋值为插件参数
results_txt = "\n".join(results) # 合并摘要
if gpt_say not in [
"1",
"2",
"3",
"4",
"5",
"6",
"7",
"8",
"9",
]: # 如插件参数不正确则使用对话模型判断
i_say_show_user = (
f"接下来将判断适合的图表类型,如连续3次判断失败将会使用流程图进行绘制"
)
gpt_say = "[Local Message] 收到。" # 用户提示
chatbot.append([i_say_show_user, gpt_say])
yield from update_ui(chatbot=chatbot, history=[]) # 更新UI
i_say = SELECT_PROMPT.format(subject=results_txt)
i_say_show_user = f'请判断适合使用的流程图类型,其中数字对应关系为:1-流程图,2-序列图,3-类图,4-饼图,5-甘特图,6-状态图,7-实体关系图,8-象限提示图。由于不管提供文本是什么,模型大概率认为"思维导图"最合适,因此思维导图仅能通过参数调用。'
for i in range(3):
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
inputs=i_say,
inputs_show_user=i_say_show_user,
llm_kwargs=llm_kwargs, chatbot=chatbot, history=[],
sys_prompt=""
llm_kwargs=llm_kwargs,
chatbot=chatbot,
history=[],
sys_prompt="",
)
if gpt_say in ['1','2','3','4','5','6','7','8','9']: #判断返回是否正确
if gpt_say in [
"1",
"2",
"3",
"4",
"5",
"6",
"7",
"8",
"9",
]: # 判断返回是否正确
break
if gpt_say not in ['1','2','3','4','5','6','7','8','9']:
gpt_say = '1'
if gpt_say not in ["1", "2", "3", "4", "5", "6", "7", "8", "9"]:
gpt_say = "1"
############################## <第 3 步,根据选择的图表类型绘制图表> ##################################
if gpt_say == '1':
if gpt_say == "1":
i_say = PROMPT_1.format(subject=results_txt)
elif gpt_say == '2':
elif gpt_say == "2":
i_say = PROMPT_2.format(subject=results_txt)
elif gpt_say == '3':
elif gpt_say == "3":
i_say = PROMPT_3.format(subject=results_txt)
elif gpt_say == '4':
elif gpt_say == "4":
i_say = PROMPT_4.format(subject=results_txt)
elif gpt_say == '5':
elif gpt_say == "5":
i_say = PROMPT_5.format(subject=results_txt)
elif gpt_say == '6':
elif gpt_say == "6":
i_say = PROMPT_6.format(subject=results_txt)
elif gpt_say == '7':
i_say = PROMPT_7.replace("{subject}", results_txt) #由于实体关系图用到了{}符号
elif gpt_say == '8':
elif gpt_say == "7":
i_say = PROMPT_7.replace("{subject}", results_txt) # 由于实体关系图用到了{}符号
elif gpt_say == "8":
i_say = PROMPT_8.format(subject=results_txt)
elif gpt_say == '9':
elif gpt_say == "9":
i_say = PROMPT_9.format(subject=results_txt)
i_say_show_user = f'请根据判断结果绘制相应的图表。如需绘制思维导图请使用参数调用,同时过大的图表可能需要复制到在线编辑器中进行渲染。'
i_say_show_user = f"请根据判断结果绘制相应的图表。如需绘制思维导图请使用参数调用,同时过大的图表可能需要复制到在线编辑器中进行渲染。"
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
inputs=i_say,
inputs_show_user=i_say_show_user,
llm_kwargs=llm_kwargs, chatbot=chatbot, history=[],
sys_prompt="你精通使用mermaid语法来绘制图表,首先确保语法正确,其次避免在mermaid语法中使用不允许的字符,此外也应当分考虑图表的可读性。"
llm_kwargs=llm_kwargs,
chatbot=chatbot,
history=[],
sys_prompt="",
)
history.append(gpt_say)
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 界面更新
def 输入区文件处理(txt):
if txt == "": return False, txt
success = True
import glob
from .crazy_utils import get_files_from_everything
file_pdf,pdf_manifest,folder_pdf = get_files_from_everything(txt, '.pdf')
file_md,md_manifest,folder_md = get_files_from_everything(txt, '.md')
if len(pdf_manifest) == 0 and len(md_manifest) == 0:
return False, txt #如输入区内容不是文件则直接返回输入区内容
final_result = ""
if file_pdf:
for index, fp in enumerate(pdf_manifest):
file_content, page_one = read_and_clean_pdf_text(fp) # 尝试按照章节切割PDF
file_content = file_content.encode('utf-8', 'ignore').decode() # avoid reading non-utf8 chars
final_result += "\n" + file_content
if file_md:
for index, fp in enumerate(md_manifest):
with open(fp, 'r', encoding='utf-8', errors='replace') as f:
file_content = f.read()
file_content = file_content.encode('utf-8', 'ignore').decode()
final_result += "\n" + file_content
return True, final_result
@CatchException
def 生成多种Mermaid图表(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
def 生成多种Mermaid图表(
txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port
):
"""
txt 输入栏用户输入的文本,例如需要翻译的一段话,再例如一个包含了待处理文件的路径
llm_kwargs gpt模型参数,如温度和top_p等,一般原样传递下去就行
@@ -275,28 +312,126 @@ def 生成多种Mermaid图表(txt, llm_kwargs, plugin_kwargs, chatbot, history,
import os
# 基本信息:功能、贡献者
chatbot.append([
chatbot.append(
[
"函数插件功能?",
"根据当前聊天历史或文件(文件内容优先)绘制多种mermaid图表,将会由对话模型首先判断适合的图表类型,随后绘制图表。\
\n您也可以使用插件参数指定绘制的图表类型,函数插件贡献者: Menghuan1918"])
"根据当前聊天历史或指定的路径文件(文件内容优先)绘制多种mermaid图表,将会由对话模型首先判断适合的图表类型,随后绘制图表。\
\n您也可以使用插件参数指定绘制的图表类型,函数插件贡献者: Menghuan1918",
]
)
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# 尝试导入依赖,如果缺少依赖,则给出安装建议
try:
import fitz
except:
report_exception(chatbot, history,
a = f"解析项目: {txt}",
b = f"导入软件依赖失败。使用该模块需要额外依赖,安装方法```pip install --upgrade pymupdf```。")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
if os.path.exists(txt): # 如输入区无内容则直接解析历史记录
from crazy_functions.pdf_fns.parse_word import extract_text_from_files
if os.path.exists(txt): #如输入区无内容则直接解析历史记录
file_exist, txt = 输入区文件处理(txt)
file_exist, final_result, page_one, file_manifest, excption = (
extract_text_from_files(txt, chatbot, history)
)
else:
file_exist = False
excption = ""
file_manifest = []
if file_exist : history = [] #如输入区内容为文件则清空历史记录
history.append(txt) #将解析后的txt传递加入到历史中
if excption != "":
if excption == "word":
report_exception(
chatbot,
history,
a=f"解析项目: {txt}",
b=f"找到了.doc文件,但是该文件格式不被支持,请先转化为.docx格式。",
)
yield from 解析历史输入(history,llm_kwargs,chatbot,plugin_kwargs)
elif excption == "pdf":
report_exception(
chatbot,
history,
a=f"解析项目: {txt}",
b=f"导入软件依赖失败。使用该模块需要额外依赖,安装方法```pip install --upgrade pymupdf```。",
)
elif excption == "word_pip":
report_exception(
chatbot,
history,
a=f"解析项目: {txt}",
b=f"导入软件依赖失败。使用该模块需要额外依赖,安装方法```pip install --upgrade python-docx pywin32```。",
)
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
else:
if not file_exist:
history.append(txt) # 如输入区不是文件则将输入区内容加入历史记录
i_say_show_user = f"首先你从历史记录中提取摘要。"
gpt_say = "[Local Message] 收到。" # 用户提示
chatbot.append([i_say_show_user, gpt_say])
yield from update_ui(chatbot=chatbot, history=history) # 更新UI
yield from 解析历史输入(
history, llm_kwargs, file_manifest, chatbot, plugin_kwargs
)
else:
file_num = len(file_manifest)
for i in range(file_num): # 依次处理文件
i_say_show_user = f"[{i+1}/{file_num}]处理文件{file_manifest[i]}"
gpt_say = "[Local Message] 收到。" # 用户提示
chatbot.append([i_say_show_user, gpt_say])
yield from update_ui(chatbot=chatbot, history=history) # 更新UI
history = [] # 如输入区内容为文件则清空历史记录
history.append(final_result[i])
yield from 解析历史输入(
history, llm_kwargs, file_manifest, chatbot, plugin_kwargs
)
class Mermaid_Gen(GptAcademicPluginTemplate):
def __init__(self):
pass
def define_arg_selection_menu(self):
gui_definition = {
"Type_of_Mermaid": ArgProperty(
title="绘制的Mermaid图表类型",
options=[
"由LLM决定",
"流程图",
"序列图",
"类图",
"饼图",
"甘特图",
"状态图",
"实体关系图",
"象限提示图",
"思维导图",
],
default_value="由LLM决定",
description="选择'由LLM决定'时将由对话模型判断适合的图表类型(不包括思维导图),选择其他类型时将直接绘制指定的图表类型。",
type="dropdown",
).model_dump_json(),
}
return gui_definition
def execute(
txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request
):
options = [
"由LLM决定",
"流程图",
"序列图",
"类图",
"饼图",
"甘特图",
"状态图",
"实体关系图",
"象限提示图",
"思维导图",
]
plugin_kwargs = options.index(plugin_kwargs['Type_of_Mermaid'])
yield from 生成多种Mermaid图表(
txt,
llm_kwargs,
plugin_kwargs,
chatbot,
history,
system_prompt,
user_request,
)

查看文件

@@ -1,6 +1,6 @@
from toolbox import CatchException, update_ui, ProxyNetworkActivate, update_ui_lastest_msg, get_log_folder, get_user
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive, get_files_from_everything
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive, get_files_from_everything
from loguru import logger
install_msg ="""
1. python -m pip install torch --index-url https://download.pytorch.org/whl/cpu
@@ -40,7 +40,7 @@ def 知识库文件注入(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst
except Exception as e:
chatbot.append(["依赖不足", f"{str(e)}\n\n导入依赖失败。请用以下命令安装" + install_msg])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# from .crazy_utils import try_install_deps
# from crazy_functions.crazy_utils import try_install_deps
# try_install_deps(['zh_langchain==0.2.1', 'pypinyin'], reload_m=['pypinyin', 'zh_langchain'])
# yield from update_ui_lastest_msg("安装完成,您可以再次重试。", chatbot, history)
return
@@ -60,7 +60,7 @@ def 知识库文件注入(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst
# < -------------------预热文本向量化模组--------------- >
chatbot.append(['<br/>'.join(file_manifest), "正在预热文本向量化模组, 如果是第一次运行, 将消耗较长时间下载中文向量化模型..."])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
print('Checking Text2vec ...')
logger.info('Checking Text2vec ...')
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
with ProxyNetworkActivate('Download_LLM'): # 临时地激活代理网络
HuggingFaceEmbeddings(model_name="GanymedeNil/text2vec-large-chinese")
@@ -68,7 +68,7 @@ def 知识库文件注入(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst
# < -------------------构建知识库--------------- >
chatbot.append(['<br/>'.join(file_manifest), "正在构建知识库..."])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
print('Establishing knowledge archive ...')
logger.info('Establishing knowledge archive ...')
with ProxyNetworkActivate('Download_LLM'): # 临时地激活代理网络
kai = knowledge_archive_interface()
vs_path = get_log_folder(user=get_user(chatbot), plugin_name='vec_store')
@@ -93,7 +93,7 @@ def 读取知识库作答(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst
except Exception as e:
chatbot.append(["依赖不足", f"{str(e)}\n\n导入依赖失败。请用以下命令安装" + install_msg])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# from .crazy_utils import try_install_deps
# from crazy_functions.crazy_utils import try_install_deps
# try_install_deps(['zh_langchain==0.2.1', 'pypinyin'], reload_m=['pypinyin', 'zh_langchain'])
# yield from update_ui_lastest_msg("安装完成,您可以再次重试。", chatbot, history)
return

查看文件

@@ -1,5 +1,5 @@
from toolbox import CatchException, update_ui
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive, input_clipping
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive, input_clipping
import requests
from bs4 import BeautifulSoup
from request_llms.bridge_all import model_info
@@ -23,8 +23,8 @@ def google(query, proxies):
item = {'title': title, 'link': link}
results.append(item)
for r in results:
print(r['link'])
# for r in results:
# print(r['link'])
return results
def scrape_text(url, proxies) -> str:

查看文件

@@ -1,5 +1,5 @@
from toolbox import CatchException, update_ui
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive, input_clipping
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive, input_clipping
import requests
from bs4 import BeautifulSoup
from request_llms.bridge_all import model_info
@@ -22,8 +22,8 @@ def bing_search(query, proxies=None):
item = {'title': title, 'link': link}
results.append(item)
for r in results:
print(r['link'])
# for r in results:
# print(r['link'])
return results

查看文件

@@ -12,6 +12,12 @@ class PaperFileGroup():
self.sp_file_index = []
self.sp_file_tag = []
# count_token
from request_llms.bridge_all import model_info
enc = model_info["gpt-3.5-turbo"]['tokenizer']
def get_token_num(txt): return len(enc.encode(txt, disallowed_special=()))
self.get_token_num = get_token_num
def run_file_split(self, max_token_limit=1900):
"""
将长文本分离开来
@@ -58,7 +64,7 @@ def parseNotebook(filename, enable_markdown=1):
def ipynb解释(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
from crazy_functions.crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
enable_markdown = plugin_kwargs.get("advanced_arg", "1")

查看文件

@@ -1,5 +1,5 @@
from toolbox import CatchException, update_ui, get_conf
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
import datetime
@CatchException
def 同时问询(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):

查看文件

@@ -1,11 +1,13 @@
from toolbox import update_ui
from toolbox import CatchException, get_conf, markdown_convertion
from request_llms.bridge_all import predict_no_ui_long_connection
from crazy_functions.crazy_utils import input_clipping
from crazy_functions.agent_fns.watchdog import WatchDog
from request_llms.bridge_all import predict_no_ui_long_connection
from crazy_functions.live_audio.aliyunASR import AliyunASR
from loguru import logger
import threading, time
import numpy as np
from .live_audio.aliyunASR import AliyunASR
import json
import re
@@ -42,9 +44,9 @@ class AsyncGptTask():
gpt_say_partial = predict_no_ui_long_connection(inputs=i_say, llm_kwargs=llm_kwargs, history=history, sys_prompt=sys_prompt,
observe_window=observe_window[index], console_slience=True)
except ConnectionAbortedError as token_exceed_err:
print('至少一个线程任务Token溢出而失败', e)
logger.error('至少一个线程任务Token溢出而失败', e)
except Exception as e:
print('至少一个线程任务意外失败', e)
logger.error('至少一个线程任务意外失败', e)
def add_async_gpt_task(self, i_say, chatbot_index, llm_kwargs, history, system_prompt):
self.observe_future.append([""])

查看文件

@@ -1,19 +1,18 @@
from toolbox import update_ui
from toolbox import CatchException, report_exception
from toolbox import write_history_to_file, promote_file_to_downloadzone
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
def 解析Paper(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
import time, glob, os
print('begin analysis on:', file_manifest)
for index, fp in enumerate(file_manifest):
with open(fp, 'r', encoding='utf-8', errors='replace') as f:
file_content = f.read()
prefix = "接下来请你逐文件分析下面的论文文件,概括其内容" if index==0 else ""
i_say = prefix + f'请对下面的文章片段用中文做一个概述,文件名是{os.path.relpath(fp, project_folder)},文章内容是 ```{file_content}```'
i_say_show_user = prefix + f'[{index}/{len(file_manifest)}] 请对下面的文章片段做一个概述: {os.path.abspath(fp)}'
i_say_show_user = prefix + f'[{index+1}/{len(file_manifest)}] 请对下面的文章片段做一个概述: {os.path.abspath(fp)}'
chatbot.append((i_say_show_user, "[Local Message] waiting gpt response."))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面

查看文件

@@ -1,4 +1,4 @@
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from toolbox import CatchException, report_exception, promote_file_to_downloadzone
from toolbox import update_ui, update_ui_lastest_msg, disable_auto_promotion, write_history_to_file
import logging

查看文件

@@ -2,6 +2,10 @@ from toolbox import CatchException, update_ui
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
import datetime
####################################################################################################################
# Demo 1: 一个非常简单的插件 #########################################################################################
####################################################################################################################
高阶功能模板函数示意图 = f"""
```mermaid
flowchart TD
@@ -26,7 +30,7 @@ flowchart TD
"""
@CatchException
def 高阶功能模板函数(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
def 高阶功能模板函数(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request, num_day=5):
"""
# 高阶功能模板函数示意图https://mermaid.live/edit#pako:eNptk1tvEkEYhv8KmattQpvlvOyFCcdeeaVXuoYssBwie8gyhCIlqVoLhrbbtAWNUpEGUkyMEDW2Fmn_DDOL_8LZHdOwxrnamX3f7_3mmZk6yKhZCfAgV1KrmYKoQ9fDuKC4yChX0nld1Aou1JzjznQ5fWmejh8LYHW6vG2a47YAnlCLNSIRolnenKBXI_zRIBrcuqRT890u7jZx7zMDt-AaMbnW1--5olGiz2sQjwfoQxsZL0hxplSSU0-rop4vrzmKR6O2JxYjHmwcL2Y_HDatVMkXlf86YzHbGY9bO5j8XE7O8Nsbc3iNB3ukL2SMcH-XIQBgWoVOZzxuOxOJOyc63EPGV6ZQLENVrznViYStTiaJ2vw2M2d9bByRnOXkgCnXylCSU5quyto_IcmkbdvctELmJ-j1ASW3uB3g5xOmKqVTmqr_Na3AtuS_dtBFm8H90XJyHkDDT7S9xXWb4HGmRChx64AOL5HRpUm411rM5uh4H78Z4V7fCZzytjZz2seto9XaNPFue07clLaVZF8UNLygJ-VES8lah_n-O-5Ozc7-77NzJ0-K0yr0ZYrmHdqAk50t2RbA4qq9uNohBASw7YpSgaRkLWCCAtxAlnRZLGbJba9bPwUAC5IsCYAnn1kpJ1ZKUACC0iBSsQLVBzUlA3ioVyQ3qGhZEUrxokiehAz4nFgqk1VNVABfB1uAD_g2_AGPl-W8nMcbCvsDblADfNCz4feyobDPy3rYEMtxwYYbPFNVUoHdCPmDHBv2cP4AMfrCbiBli-Q-3afv0X6WdsIjW2-10fgDy1SAig
@@ -43,7 +47,7 @@ def 高阶功能模板函数(txt, llm_kwargs, plugin_kwargs, chatbot, history, s
"您正在调用插件:历史上的今天",
"[Local Message] 请注意,您正在调用一个[函数插件]的模板,该函数面向希望实现更多有趣功能的开发者,它可以作为创建新功能函数的模板该函数只有20多行代码。此外我们也提供可同步处理大量文件的多线程Demo供您参考。您若希望分享新的功能模组,请不吝PR" + 高阶功能模板函数示意图))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 由于请求gpt需要一段时间,我们先及时地做一次界面更新
for i in range(5):
for i in range(int(num_day)):
currentMonth = (datetime.date.today() + datetime.timedelta(days=i)).month
currentDay = (datetime.date.today() + datetime.timedelta(days=i)).day
i_say = f'历史中哪些事件发生在{currentMonth}{currentDay}日?列举两条并发送相关图片。发送图片时,请使用Markdown,将Unsplash API中的PUT_YOUR_QUERY_HERE替换成描述该事件的一个最重要的单词。'
@@ -59,6 +63,56 @@ def 高阶功能模板函数(txt, llm_kwargs, plugin_kwargs, chatbot, history, s
####################################################################################################################
# Demo 2: 一个带二级菜单的插件 #######################################################################################
####################################################################################################################
from crazy_functions.plugin_template.plugin_class_template import GptAcademicPluginTemplate, ArgProperty
class Demo_Wrap(GptAcademicPluginTemplate):
def __init__(self):
"""
请注意`execute`会执行在不同的线程中,因此您在定义和使用类变量时,应当慎之又慎!
"""
pass
def define_arg_selection_menu(self):
"""
定义插件的二级选项菜单
"""
gui_definition = {
"num_day":
ArgProperty(title="日期选择", options=["仅今天", "未来3天", "未来5天"], default_value="未来3天", description="", type="dropdown").model_dump_json(),
}
return gui_definition
def execute(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
"""
执行插件
"""
num_day = plugin_kwargs["num_day"]
if num_day == "仅今天": num_day = 1
if num_day == "未来3天": num_day = 3
if num_day == "未来5天": num_day = 5
yield from 高阶功能模板函数(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request, num_day=num_day)
####################################################################################################################
# Demo 3: 绘制脑图的Demo ############################################################################################
####################################################################################################################
PROMPT = """
请你给出围绕“{subject}”的逻辑关系图,使用mermaid语法,mermaid语法举例
```mermaid

查看文件

@@ -4,9 +4,9 @@
# 1. 请在以下方案中选择任意一种,然后删除其他的方案
# 2. 修改你选择的方案中的environment环境变量,详情请见github wiki或者config.py
# 3. 选择一种暴露服务端口的方法,并对相应的配置做出修改:
# 方法1: 适用于Linux,很方便,可惜windows不支持与宿主的网络融合为一体,这个是默认配置
# 方法1: 适用于Linux,很方便,可惜windows不支持与宿主的网络融合为一体,这个是默认配置
# network_mode: "host"
# 方法2: 适用于所有系统包括Windows和MacOS端口映射,把容器的端口映射到宿主的端口注意您需要先删除network_mode: "host",再追加以下内容)
# 方法2: 适用于所有系统包括Windows和MacOS端口映射,把容器的端口映射到宿主的端口注意您需要先删除network_mode: "host",再追加以下内容)
# ports:
# - "12345:12345" # 注意12345必须与WEB_PORT环境变量相互对应
# 4. 最后`docker-compose up`运行
@@ -25,7 +25,7 @@
## ===================================================
## ===================================================
## 方案零 部署项目的全部能力这个是包含cuda和latex的大型镜像。如果您网速慢、硬盘小或没有显卡,则不推荐使用这个
## 方案零 部署项目的全部能力这个是包含cuda和latex的大型镜像。如果您网速慢、硬盘小或没有显卡,则不推荐使用这个
## ===================================================
version: '3'
services:
@@ -63,10 +63,10 @@ services:
# count: 1
# capabilities: [gpu]
# WEB_PORT暴露方法1: 适用于Linux与宿主的网络融合
# WEB_PORT暴露方法1: 适用于Linux与宿主的网络融合
network_mode: "host"
# WEB_PORT暴露方法2: 适用于所有系统端口映射
# WEB_PORT暴露方法2: 适用于所有系统端口映射
# ports:
# - "12345:12345" # 12345必须与WEB_PORT相互对应
@@ -75,10 +75,8 @@ services:
bash -c "python3 -u main.py"
## ===================================================
## 方案一 如果不需要运行本地模型(仅 chatgpt, azure, 星火, 千帆, claude 等在线大模型服务)
## 方案一 如果不需要运行本地模型(仅 chatgpt, azure, 星火, 千帆, claude 等在线大模型服务)
## ===================================================
version: '3'
services:
@@ -97,16 +95,16 @@ services:
# DEFAULT_WORKER_NUM: ' 10 '
# AUTHENTICATION: ' [("username", "passwd"), ("username2", "passwd2")] '
# 与宿主的网络融合
# 「WEB_PORT暴露方法1: 适用于Linux」与宿主的网络融合
network_mode: "host"
# 不使用代理网络拉取最新代码
# 启动命令
command: >
bash -c "python3 -u main.py"
### ===================================================
### 方案二 如果需要运行ChatGLM + Qwen + MOSS等本地模型
### 方案二 如果需要运行ChatGLM + Qwen + MOSS等本地模型
### ===================================================
version: '3'
services:
@@ -130,8 +128,10 @@ services:
devices:
- /dev/nvidia0:/dev/nvidia0
# 与宿主的网络融合
# 「WEB_PORT暴露方法1: 适用于Linux」与宿主的网络融合
network_mode: "host"
# 启动命令
command: >
bash -c "python3 -u main.py"
@@ -139,8 +139,9 @@ services:
# command: >
# bash -c "pip install -r request_llms/requirements_qwen.txt && python3 -u main.py"
### ===================================================
### 方案三 如果需要运行ChatGPT + LLAMA + 盘古 + RWKV本地模型
### 方案三 如果需要运行ChatGPT + LLAMA + 盘古 + RWKV本地模型
### ===================================================
version: '3'
services:
@@ -164,21 +165,22 @@ services:
devices:
- /dev/nvidia0:/dev/nvidia0
# 与宿主的网络融合
# 「WEB_PORT暴露方法1: 适用于Linux」与宿主的网络融合
network_mode: "host"
# 不使用代理网络拉取最新代码
# 启动命令
command: >
python3 -u main.py
## ===================================================
## 方案四 ChatGPT + Latex
## 方案四 ChatGPT + Latex
## ===================================================
version: '3'
services:
gpt_academic_with_latex:
image: ghcr.io/binary-husky/gpt_academic_with_latex:master # (Auto Built by Dockerfile: docs/GithubAction+NoLocal+Latex)
# 对于ARM64设备,请将以上镜像名称替换为 ghcr.io/binary-husky/gpt_academic_with_latex_arm:master
environment:
# 请查阅 `config.py` 以查看所有的配置信息
API_KEY: ' sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx '
@@ -190,16 +192,16 @@ services:
DEFAULT_WORKER_NUM: ' 10 '
WEB_PORT: ' 12303 '
# 与宿主的网络融合
# 「WEB_PORT暴露方法1: 适用于Linux」与宿主的网络融合
network_mode: "host"
# 不使用代理网络拉取最新代码
# 启动命令
command: >
bash -c "python3 -u main.py"
## ===================================================
## 方案五 ChatGPT + 语音助手 (请先阅读 docs/use_audio.md
## 方案五 ChatGPT + 语音助手 (请先阅读 docs/use_audio.md
## ===================================================
version: '3'
services:
@@ -223,9 +225,9 @@ services:
# (无需填写) ALIYUN_ACCESSKEY: ' LTAI5q6BrFUzoRXVGUWnekh1 '
# (无需填写) ALIYUN_SECRET: ' eHmI20AVWIaQZ0CiTD2bGQVsaP9i68 '
# 与宿主的网络融合
# 「WEB_PORT暴露方法1: 适用于Linux」与宿主的网络融合
network_mode: "host"
# 不使用代理网络拉取最新代码
# 启动命令
command: >
bash -c "python3 -u main.py"

查看文件

@@ -1 +0,0 @@
# 此Dockerfile不再维护,请前往docs/GithubAction+JittorLLMs

查看文件

@@ -3,6 +3,9 @@
# 从NVIDIA源,从而支持显卡检查宿主的nvidia-smi中的cuda版本必须>=11.3
FROM fuqingxu/11.3.1-runtime-ubuntu20.04-with-texlive:latest
# edge-tts需要的依赖,某些pip包所需的依赖
RUN apt update && apt install ffmpeg build-essential -y
# use python3 as the system default python
WORKDIR /gpt
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.8

查看文件

@@ -1,53 +0,0 @@
# docker build -t gpt-academic-all-capacity -f docs/GithubAction+AllCapacity --network=host --build-arg http_proxy=http://localhost:10881 --build-arg https_proxy=http://localhost:10881 .
# docker build -t gpt-academic-all-capacity -f docs/GithubAction+AllCapacityBeta --network=host .
# docker run -it --net=host gpt-academic-all-capacity bash
# 从NVIDIA源,从而支持显卡检查宿主的nvidia-smi中的cuda版本必须>=11.3
FROM fuqingxu/11.3.1-runtime-ubuntu20.04-with-texlive:latest
# use python3 as the system default python
WORKDIR /gpt
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.8
# # 非必要步骤,更换pip源 (以下三行,可以删除)
# RUN echo '[global]' > /etc/pip.conf && \
# echo 'index-url = https://mirrors.aliyun.com/pypi/simple/' >> /etc/pip.conf && \
# echo 'trusted-host = mirrors.aliyun.com' >> /etc/pip.conf
# 下载pytorch
RUN python3 -m pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu113
# 准备pip依赖
RUN python3 -m pip install openai numpy arxiv rich
RUN python3 -m pip install colorama Markdown pygments pymupdf
RUN python3 -m pip install python-docx moviepy pdfminer
RUN python3 -m pip install zh_langchain==0.2.1 pypinyin
RUN python3 -m pip install rarfile py7zr
RUN python3 -m pip install aliyun-python-sdk-core==2.13.3 pyOpenSSL webrtcvad scipy git+https://github.com/aliyun/alibabacloud-nls-python-sdk.git
# 下载分支
WORKDIR /gpt
RUN git clone --depth=1 https://github.com/binary-husky/gpt_academic.git
WORKDIR /gpt/gpt_academic
RUN git clone --depth=1 https://github.com/OpenLMLab/MOSS.git request_llms/moss
RUN python3 -m pip install -r requirements.txt
RUN python3 -m pip install -r request_llms/requirements_moss.txt
RUN python3 -m pip install -r request_llms/requirements_qwen.txt
RUN python3 -m pip install -r request_llms/requirements_chatglm.txt
RUN python3 -m pip install -r request_llms/requirements_newbing.txt
RUN python3 -m pip install nougat-ocr
# 预热Tiktoken模块
RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'
# 安装知识库插件的额外依赖
RUN apt-get update && apt-get install libgl1 -y
RUN pip3 install transformers protobuf langchain sentence-transformers faiss-cpu nltk beautifulsoup4 bitsandbytes tabulate icetk --upgrade
RUN pip3 install unstructured[all-docs] --upgrade
RUN python3 -c 'from check_proxy import warm_up_vectordb; warm_up_vectordb()'
RUN rm -rf /usr/local/lib/python3.8/dist-packages/tests
# COPY .cache /root/.cache
# COPY config_private.py config_private.py
# 启动
CMD ["python3", "-u", "main.py"]

查看文件

@@ -5,6 +5,8 @@ RUN apt-get update
RUN apt-get install -y curl proxychains curl gcc
RUN apt-get install -y git python python3 python-dev python3-dev --fix-missing
# edge-tts需要的依赖,某些pip包所需的依赖
RUN apt update && apt install ffmpeg build-essential -y
# use python3 as the system default python
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.8
@@ -22,7 +24,6 @@ RUN python3 -m pip install -r request_llms/requirements_chatglm.txt
RUN python3 -m pip install -r request_llms/requirements_newbing.txt
# 预热Tiktoken模块
RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'

查看文件

@@ -23,6 +23,9 @@ RUN python3 -m pip install -r request_llms/requirements_jittorllms.txt -i https:
# 下载JittorLLMs
RUN git clone https://github.com/binary-husky/JittorLLMs.git --depth 1 request_llms/jittorllms
# edge-tts需要的依赖
RUN apt update && apt install ffmpeg -y
# 禁用缓存,确保更新代码
ADD "https://www.random.org/cgi-bin/randbyte?nbytes=10&format=h" skipcache
RUN git pull

查看文件

@@ -12,6 +12,8 @@ COPY . .
# 安装依赖
RUN pip3 install -r requirements.txt
# edge-tts需要的依赖
RUN apt update && apt install ffmpeg -y
# 可选步骤,用于预热模块
RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'

查看文件

@@ -13,7 +13,10 @@ COPY . .
RUN pip3 install -r requirements.txt
# 安装语音插件的额外依赖
RUN pip3 install pyOpenSSL scipy git+https://github.com/aliyun/alibabacloud-nls-python-sdk.git
RUN pip3 install aliyun-python-sdk-core==2.13.3 pyOpenSSL webrtcvad scipy git+https://github.com/aliyun/alibabacloud-nls-python-sdk.git
# edge-tts需要的依赖
RUN apt update && apt install ffmpeg -y
# 可选步骤,用于预热模块
RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'

查看文件

@@ -1,29 +1,31 @@
# 此Dockerfile适用于无本地模型的环境构建,如果需要使用chatglm等本地模型,请参考 docs/Dockerfile+ChatGLM
# 此Dockerfile适用于"无本地模型"的环境构建,如果需要使用chatglm等本地模型,请参考 docs/Dockerfile+ChatGLM
# - 1 修改 `config.py`
# - 2 构建 docker build -t gpt-academic-nolocal-latex -f docs/GithubAction+NoLocal+Latex .
# - 3 运行 docker run -v /home/fuqingxu/arxiv_cache:/root/arxiv_cache --rm -it --net=host gpt-academic-nolocal-latex
FROM fuqingxu/python311_texlive_ctex:latest
ENV PATH "$PATH:/usr/local/texlive/2022/bin/x86_64-linux"
ENV PATH "$PATH:/usr/local/texlive/2023/bin/x86_64-linux"
ENV PATH "$PATH:/usr/local/texlive/2024/bin/x86_64-linux"
ENV PATH "$PATH:/usr/local/texlive/2025/bin/x86_64-linux"
ENV PATH "$PATH:/usr/local/texlive/2026/bin/x86_64-linux"
# 指定路径
FROM menghuan1918/ubuntu_uv_ctex:latest
ENV DEBIAN_FRONTEND=noninteractive
SHELL ["/bin/bash", "-c"]
WORKDIR /gpt
RUN pip3 install openai numpy arxiv rich
RUN pip3 install colorama Markdown pygments pymupdf
RUN pip3 install python-docx pdfminer
RUN pip3 install nougat-ocr
# 装载项目文件
COPY . .
# 先复制依赖文件
COPY requirements.txt .
# 安装依赖
RUN pip3 install -r requirements.txt
RUN pip install --break-system-packages openai numpy arxiv rich colorama Markdown pygments pymupdf python-docx pdfminer \
&& pip install --break-system-packages -r requirements.txt \
&& if [ "$(uname -m)" = "x86_64" ]; then \
pip install --break-system-packages nougat-ocr; \
fi \
&& pip cache purge \
&& rm -rf /root/.cache/pip/*
# 创建非root用户
RUN useradd -m gptuser && chown -R gptuser /gpt
USER gptuser
# 最后才复制代码文件,这样代码更新时只需重建最后几层,可以大幅减少docker pull所需的大小
COPY --chown=gptuser:gptuser . .
# 可选步骤,用于预热模块
RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'

查看文件

@@ -19,6 +19,9 @@ RUN pip3 install transformers protobuf langchain sentence-transformers faiss-cp
RUN pip3 install unstructured[all-docs] --upgrade
RUN python3 -c 'from check_proxy import warm_up_vectordb; warm_up_vectordb()'
# edge-tts需要的依赖
RUN apt update && apt install ffmpeg -y
# 可选步骤,用于预热模块
RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'

查看文件

@@ -4,7 +4,7 @@ We currently support fastapi in order to solve sub-path deploy issue.
1. change CUSTOM_PATH setting in `config.py`
``` sh
```sh
nano config.py
```
@@ -35,9 +35,8 @@ if __name__ == "__main__":
main()
```
3. Go!
``` sh
```sh
python main.py
```

查看文件

@@ -0,0 +1,189 @@
# 实现带二级菜单的插件
## 一、如何写带有二级菜单的插件
1. 声明一个 `Class`,继承父类 `GptAcademicPluginTemplate`
```python
from crazy_functions.plugin_template.plugin_class_template import GptAcademicPluginTemplate
from crazy_functions.plugin_template.plugin_class_template import ArgProperty
class Demo_Wrap(GptAcademicPluginTemplate):
def __init__(self): ...
```
2. 声明二级菜单中需要的变量,覆盖父类的`define_arg_selection_menu`函数。
```python
class Demo_Wrap(GptAcademicPluginTemplate):
...
def define_arg_selection_menu(self):
"""
定义插件的二级选项菜单
第一个参数,名称`main_input`,参数`type`声明这是一个文本框,文本框上方显示`title`,文本框内部显示`description`,`default_value`为默认值;
第二个参数,名称`advanced_arg`,参数`type`声明这是一个文本框,文本框上方显示`title`,文本框内部显示`description`,`default_value`为默认值;
第三个参数,名称`allow_cache`,参数`type`声明这是一个下拉菜单,下拉菜单上方显示`title`+`description`,下拉菜单的选项为`options`,`default_value`为下拉菜单默认值;
"""
gui_definition = {
"main_input":
ArgProperty(title="ArxivID", description="输入Arxiv的ID或者网址", default_value="", type="string").model_dump_json(),
"advanced_arg":
ArgProperty(title="额外的翻译提示词",
description=r"如果有必要, 请在此处给出自定义翻译命令",
default_value="", type="string").model_dump_json(),
"allow_cache":
ArgProperty(title="是否允许从缓存中调取结果", options=["允许缓存", "从头执行"], default_value="允许缓存", description="无", type="dropdown").model_dump_json(),
}
return gui_definition
...
```
> [!IMPORTANT]
>
> ArgProperty 中每个条目对应一个参数,`type == "string"`时,使用文本块,`type == dropdown`时,使用下拉菜单。
>
> 注意:`main_input` 和 `advanced_arg`是两个特殊的参数。`main_input`会自动与界面右上角的`输入区`进行同步,而`advanced_arg`会自动与界面右下角的`高级参数输入区`同步。除此之外,参数名称可以任意选取。其他细节详见`crazy_functions/plugin_template/plugin_class_template.py`。
3. 编写插件程序,覆盖父类的`execute`函数。
例如:
```python
class Demo_Wrap(GptAcademicPluginTemplate):
...
...
def execute(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request):
"""
执行插件
plugin_kwargs字典中会包含用户的选择,与上述 `define_arg_selection_menu` 一一对应
"""
allow_cache = plugin_kwargs["allow_cache"]
advanced_arg = plugin_kwargs["advanced_arg"]
if allow_cache == "从头执行": plugin_kwargs["advanced_arg"] = "--no-cache " + plugin_kwargs["advanced_arg"]
yield from Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request)
```
4. 注册插件
将以下条目插入`crazy_functional.py`即可。注意,与旧插件不同的是,`Function`键值应该为None,而`Class`键值为上述插件的类名称(`Demo_Wrap`)。
```
"新插件": {
"Group": "学术",
"Color": "stop",
"AsButton": True,
"Info": "插件说明",
"Function": None,
"Class": Demo_Wrap,
},
```
5. 已经结束了,启动程序测试吧~
## 二、背后的原理需要JavaScript的前置知识
### (I) 首先介绍三个Gradio官方没有的重要前端函数
主javascript程序`common.js`中有三个Gradio官方没有的重要API
1. `get_data_from_gradio_component`
这个函数可以获取任意gradio组件的当前值,例如textbox中的字符,dropdown中的当前选项,chatbot当前的对话等等。调用方法举例
```javascript
// 获取当前的对话
let chatbot = await get_data_from_gradio_component('gpt-chatbot');
```
2. `get_gradio_component`
有时候我们不仅需要gradio组件的当前值,还需要它的label值、是否隐藏、下拉菜单其他可选选项等等,而通过这个函数可以直接获取这个组件的句柄。举例
```javascript
// 获取下拉菜单组件的句柄
var model_sel = await get_gradio_component("elem_model_sel");
// 获取它的所有属性,包括其所有可选选项
console.log(model_sel.props)
```
3. `push_data_to_gradio_component`
这个函数可以将数据推回gradio组件,例如textbox中的字符,dropdown中的当前选项等等。调用方法举例
```javascript
// 修改一个按钮上面的文本
push_data_to_gradio_component("btnName", "gradio_element_id", "string");
// 隐藏一个组件
push_data_to_gradio_component({ visible: false, __type__: 'update' }, "plugin_arg_menu", "obj");
// 修改组件label
push_data_to_gradio_component({ label: '新label的值', __type__: 'update' }, "gpt-chatbot", "obj")
// 第一个参数是value,
// - 可以是字符串调整textbox的文本,按钮的文本
// - 还可以是 { visible: false, __type__: 'update' } 这样的字典调整visible, label, choices
// 第二个参数是elem_id
// 第三个参数是"string" 或者 "obj"
```
### (II) 从点击插件到执行插件的逻辑过程
简述程序启动时把每个插件的二级菜单编码为BASE64,存储在用户的浏览器前端,用户调用对应功能时,会按照插件的BASE64编码,将平时隐藏的菜单有选择性地显示出来。
1. 启动阶段(主函数 `main.py` 中,遍历每个插件,生成二级菜单的BASE64编码,存入变量`register_advanced_plugin_init_code_arr`。
```python
def get_js_code_for_generating_menu(self, btnName):
define_arg_selection = self.define_arg_selection_menu()
DEFINE_ARG_INPUT_INTERFACE = json.dumps(define_arg_selection)
return base64.b64encode(DEFINE_ARG_INPUT_INTERFACE.encode('utf-8')).decode('utf-8')
```
2. 用户加载阶段主javascript程序`common.js`中),浏览器加载`register_advanced_plugin_init_code_arr`,存入本地的字典`advanced_plugin_init_code_lib`
```javascript
advanced_plugin_init_code_lib = {}
function register_advanced_plugin_init_code(key, code){
advanced_plugin_init_code_lib[key] = code;
}
```
3. 用户点击插件按钮(主函数 `main.py` 中时,仅执行以下javascript代码,唤醒隐藏的二级菜单生成菜单的代码在`common.js`中的`generate_menu`函数上):
```javascript
// 生成高级插件的选择菜单
function run_advanced_plugin_launch_code(key){
generate_menu(advanced_plugin_init_code_lib[key], key);
}
function on_flex_button_click(key){
run_advanced_plugin_launch_code(key);
}
```
```python
click_handle = plugins[k]["Button"].click(None, inputs=[], outputs=None, _js=f"""()=>run_advanced_plugin_launch_code("{k}")""")
```
4. 当用户点击二级菜单的执行键时,通过javascript脚本模拟点击一个隐藏按钮,触发后续程序`common.js`中的`execute_current_pop_up_plugin`,会把二级菜单中的参数缓存到`invisible_current_pop_up_plugin_arg_final`,然后模拟点击`invisible_callback_btn_for_plugin_exe`按钮)。隐藏按钮的定义在(主函数 `main.py` ),该隐藏按钮会最终触发`route_switchy_bt_with_arg`函数(定义于`themes/gui_advanced_plugin_class.py`
```python
click_handle_ng = new_plugin_callback.click(route_switchy_bt_with_arg, [
gr.State(["new_plugin_callback", "usr_confirmed_arg"] + input_combo_order),
new_plugin_callback, usr_confirmed_arg, *input_combo
], output_combo)
```
5. 最后,`route_switchy_bt_with_arg`中,会搜集所有用户参数,统一集中到`plugin_kwargs`参数中,并执行对应插件的`execute`函数。

查看文件

@@ -22,13 +22,13 @@
| crazy_functions\下载arxiv论文翻译摘要.py | 下载 `arxiv` 论文的 PDF 文件,并提取摘要和翻译 |
| crazy_functions\代码重写为全英文_多线程.py | 将Python源代码文件中的中文内容转化为英文 |
| crazy_functions\图片生成.py | 根据激励文本使用GPT模型生成相应的图像 |
| crazy_functions\对话历史存档.py | 将每次对话记录写入Markdown格式的文件中 |
| crazy_functions\Conversation_To_File.py | 将每次对话记录写入Markdown格式的文件中 |
| crazy_functions\总结word文档.py | 对输入的word文档进行摘要生成 |
| crazy_functions\总结音视频.py | 对输入的音视频文件进行摘要生成 |
| crazy_functions\批量Markdown翻译.py | 将指定目录下的Markdown文件进行中英文翻译 |
| crazy_functions\Markdown_Translate.py | 将指定目录下的Markdown文件进行中英文翻译 |
| crazy_functions\批量总结PDF文档.py | 对PDF文件进行切割和摘要生成 |
| crazy_functions\批量总结PDF文档pdfminer.py | 对PDF文件进行文本内容的提取和摘要生成 |
| crazy_functions\批量翻译PDF文档_多线程.py | 将指定目录下的PDF文件进行中英文翻译 |
| crazy_functions\PDF_Translate.py | 将指定目录下的PDF文件进行中英文翻译 |
| crazy_functions\理解PDF文档内容.py | 对PDF文件进行摘要生成和问题解答 |
| crazy_functions\生成函数注释.py | 自动生成Python函数的注释 |
| crazy_functions\联网的ChatGPT.py | 使用网络爬虫和ChatGPT模型进行聊天回答 |
@@ -155,9 +155,9 @@ toolbox.py是一个工具类库,其中主要包含了一些函数装饰器和
该程序文件提供了一个用于生成图像的函数`图片生成`。函数实现的过程中,会调用`gen_image`函数来生成图像,并返回图像生成的网址和本地文件地址。函数有多个参数,包括`prompt`(激励文本)、`llm_kwargs`(GPT模型的参数)、`plugin_kwargs`(插件模型的参数)等。函数核心代码使用了`requests`库向OpenAI API请求图像,并做了简单的处理和保存。函数还更新了交互界面,清空聊天历史并显示正在生成图像的消息和最终的图像网址和预览。
## [18/48] 请对下面的程序文件做一个概述: crazy_functions\对话历史存档.py
## [18/48] 请对下面的程序文件做一个概述: crazy_functions\Conversation_To_File.py
这个文件是名为crazy_functions\对话历史存档.py的Python程序文件,包含了4个函数
这个文件是名为crazy_functions\Conversation_To_File.py的Python程序文件,包含了4个函数
1. write_chat_to_file(chatbot, history=None, file_name=None)用来将对话记录以Markdown格式写入文件中,并且生成文件名,如果没指定文件名则用当前时间。写入完成后将文件路径打印出来。
@@ -165,7 +165,7 @@ toolbox.py是一个工具类库,其中主要包含了一些函数装饰器和
3. read_file_to_chat(chatbot, history, file_name):从传入的文件中读取内容,解析出对话历史记录并更新聊天显示框。
4. 对话历史存档(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request)一个主要函数,用于保存当前对话记录并提醒用户。如果用户希望加载历史记录,则调用read_file_to_chat()来更新聊天显示框。如果用户希望删除历史记录,调用删除所有本地对话历史记录()函数完成删除操作。
4. Conversation_To_File(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_request)一个主要函数,用于保存当前对话记录并提醒用户。如果用户希望加载历史记录,则调用read_file_to_chat()来更新聊天显示框。如果用户希望删除历史记录,调用删除所有本地对话历史记录()函数完成删除操作。
## [19/48] 请对下面的程序文件做一个概述: crazy_functions\总结word文档.py
@@ -175,9 +175,9 @@ toolbox.py是一个工具类库,其中主要包含了一些函数装饰器和
该程序文件包括两个函数split_audio_file()和AnalyAudio(),并且导入了一些必要的库并定义了一些工具函数。split_audio_file用于将音频文件分割成多个时长相等的片段,返回一个包含所有切割音频片段文件路径的列表,而AnalyAudio用来分析音频文件,通过调用whisper模型进行音频转文字并使用GPT模型对音频内容进行概述,最终将所有总结结果写入结果文件中。
## [21/48] 请对下面的程序文件做一个概述: crazy_functions\批量Markdown翻译.py
## [21/48] 请对下面的程序文件做一个概述: crazy_functions\Markdown_Translate.py
该程序文件名为`批量Markdown翻译.py`,包含了以下功能读取Markdown文件,将长文本分离开来,将Markdown文件进行翻译英译中和中译英,整理结果并退出。程序使用了多线程以提高效率。程序使用了`tiktoken`依赖库,可能需要额外安装。文件中还有一些其他的函数和类,但与文件名所描述的功能无关。
该程序文件名为`Markdown_Translate.py`,包含了以下功能读取Markdown文件,将长文本分离开来,将Markdown文件进行翻译英译中和中译英,整理结果并退出。程序使用了多线程以提高效率。程序使用了`tiktoken`依赖库,可能需要额外安装。文件中还有一些其他的函数和类,但与文件名所描述的功能无关。
## [22/48] 请对下面的程序文件做一个概述: crazy_functions\批量总结PDF文档.py
@@ -187,9 +187,9 @@ toolbox.py是一个工具类库,其中主要包含了一些函数装饰器和
该程序文件是一个用于批量总结PDF文档的函数插件,使用了pdfminer插件和BeautifulSoup库来提取PDF文档的文本内容,对每个PDF文件分别进行处理并生成中英文摘要。同时,该程序文件还包括一些辅助工具函数和处理异常的装饰器。
## [24/48] 请对下面的程序文件做一个概述: crazy_functions\批量翻译PDF文档_多线程.py
## [24/48] 请对下面的程序文件做一个概述: crazy_functions\PDF_Translate.py
这个程序文件是一个Python脚本,文件名为“批量翻译PDF文档_多线程.py”。它主要使用了“toolbox”、“request_gpt_model_in_new_thread_with_ui_alive”、“request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency”、“colorful”等Python库和自定义的模块“crazy_utils”的一些函数。程序实现了一个批量翻译PDF文档的功能,可以自动解析PDF文件中的基础信息,递归地切割PDF文件,翻译和处理PDF论文中的所有内容,并生成相应的翻译结果文件包括md文件和html文件。功能比较复杂,其中需要调用多个函数和依赖库,涉及到多线程操作和UI更新。文件中有详细的注释和变量命名,代码比较清晰易读。
这个程序文件是一个Python脚本,文件名为“PDF_Translate.py”。它主要使用了“toolbox”、“request_gpt_model_in_new_thread_with_ui_alive”、“request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency”、“colorful”等Python库和自定义的模块“crazy_utils”的一些函数。程序实现了一个批量翻译PDF文档的功能,可以自动解析PDF文件中的基础信息,递归地切割PDF文件,翻译和处理PDF论文中的所有内容,并生成相应的翻译结果文件包括md文件和html文件。功能比较复杂,其中需要调用多个函数和依赖库,涉及到多线程操作和UI更新。文件中有详细的注释和变量命名,代码比较清晰易读。
## [25/48] 请对下面的程序文件做一个概述: crazy_functions\理解PDF文档内容.py
@@ -331,19 +331,19 @@ check_proxy.py, colorful.py, config.py, config_private.py, core_functional.py, c
这些程序源文件提供了基础的文本和语言处理功能、工具函数和高级插件,使 Chatbot 能够处理各种复杂的学术文本问题,包括润色、翻译、搜索、下载、解析等。
## 用一张Markdown表格简要描述以下文件的功能
crazy_functions\代码重写为全英文_多线程.py, crazy_functions\图片生成.py, crazy_functions\对话历史存档.py, crazy_functions\总结word文档.py, crazy_functions\总结音视频.py, crazy_functions\批量Markdown翻译.py, crazy_functions\批量总结PDF文档.py, crazy_functions\批量总结PDF文档pdfminer.py, crazy_functions\批量翻译PDF文档_多线程.py, crazy_functions\理解PDF文档内容.py, crazy_functions\生成函数注释.py, crazy_functions\联网的ChatGPT.py, crazy_functions\解析JupyterNotebook.py, crazy_functions\解析项目源代码.py, crazy_functions\询问多个大语言模型.py, crazy_functions\读文章写摘要.py。根据以上分析,用一句话概括程序的整体功能。
crazy_functions\代码重写为全英文_多线程.py, crazy_functions\图片生成.py, crazy_functions\Conversation_To_File.py, crazy_functions\总结word文档.py, crazy_functions\总结音视频.py, crazy_functions\Markdown_Translate.py, crazy_functions\批量总结PDF文档.py, crazy_functions\批量总结PDF文档pdfminer.py, crazy_functions\PDF_Translate.py, crazy_functions\理解PDF文档内容.py, crazy_functions\生成函数注释.py, crazy_functions\联网的ChatGPT.py, crazy_functions\解析JupyterNotebook.py, crazy_functions\解析项目源代码.py, crazy_functions\询问多个大语言模型.py, crazy_functions\读文章写摘要.py。根据以上分析,用一句话概括程序的整体功能。
| 文件名 | 功能简述 |
| --- | --- |
| 代码重写为全英文_多线程.py | 将Python源代码文件中的中文内容转化为英文 |
| 图片生成.py | 根据激励文本使用GPT模型生成相应的图像 |
| 对话历史存档.py | 将每次对话记录写入Markdown格式的文件中 |
| Conversation_To_File.py | 将每次对话记录写入Markdown格式的文件中 |
| 总结word文档.py | 对输入的word文档进行摘要生成 |
| 总结音视频.py | 对输入的音视频文件进行摘要生成 |
| 批量Markdown翻译.py | 将指定目录下的Markdown文件进行中英文翻译 |
| Markdown_Translate.py | 将指定目录下的Markdown文件进行中英文翻译 |
| 批量总结PDF文档.py | 对PDF文件进行切割和摘要生成 |
| 批量总结PDF文档pdfminer.py | 对PDF文件进行文本内容的提取和摘要生成 |
| 批量翻译PDF文档_多线程.py | 将指定目录下的PDF文件进行中英文翻译 |
| PDF_Translate.py | 将指定目录下的PDF文件进行中英文翻译 |
| 理解PDF文档内容.py | 对PDF文件进行摘要生成和问题解答 |
| 生成函数注释.py | 自动生成Python函数的注释 |
| 联网的ChatGPT.py | 使用网络爬虫和ChatGPT模型进行聊天回答 |

文件差异内容过多而无法显示 加载差异

查看文件

@@ -36,15 +36,15 @@
"总结word文档": "SummarizeWordDocument",
"解析ipynb文件": "ParseIpynbFile",
"解析JupyterNotebook": "ParseJupyterNotebook",
"对话历史存档": "ConversationHistoryArchive",
"载入对话历史存档": "LoadConversationHistoryArchive",
"Conversation_To_File": "ConversationHistoryArchive",
"载入Conversation_To_File": "LoadConversationHistoryArchive",
"删除所有本地对话历史记录": "DeleteAllLocalChatHistory",
"Markdown英译中": "MarkdownTranslateFromEngToChi",
"批量Markdown翻译": "BatchTranslateMarkdown",
"Markdown_Translate": "BatchTranslateMarkdown",
"批量总结PDF文档": "BatchSummarizePDFDocuments",
"批量总结PDF文档pdfminer": "BatchSummarizePDFDocumentsUsingPDFMiner",
"批量翻译PDF文档": "BatchTranslatePDFDocuments",
"批量翻译PDF文档_多线程": "BatchTranslatePDFDocumentsUsingMultiThreading",
"PDF_Translate": "BatchTranslatePDFDocumentsUsingMultiThreading",
"谷歌检索小助手": "GoogleSearchAssistant",
"理解PDF文档内容标准文件输入": "StandardFileInputForUnderstandingPDFDocumentContent",
"理解PDF文档内容": "UnderstandingPDFDocumentContent",
@@ -1492,7 +1492,7 @@
"交互功能模板函数": "InteractiveFunctionTemplateFunction",
"交互功能函数模板": "InteractiveFunctionFunctionTemplate",
"Latex英文纠错加PDF对比": "LatexEnglishErrorCorrectionWithPDFComparison",
"Latex输出PDF结果": "LatexOutputPDFResult",
"Latex_Function": "LatexOutputPDFResult",
"Latex翻译中文并重新编译PDF": "TranslateChineseAndRecompilePDF",
"语音助手": "VoiceAssistant",
"微调数据集生成": "FineTuneDatasetGeneration",

查看文件

@@ -6,17 +6,14 @@
"Latex英文纠错加PDF对比": "CorrectEnglishInLatexWithPDFComparison",
"下载arxiv论文并翻译摘要": "DownloadArxivPaperAndTranslateAbstract",
"Markdown翻译指定语言": "TranslateMarkdownToSpecifiedLanguage",
"批量翻译PDF文档_多线程": "BatchTranslatePDFDocuments_MultiThreaded",
"下载arxiv论文翻译摘要": "DownloadArxivPaperTranslateAbstract",
"解析一个Python项目": "ParsePythonProject",
"解析一个Golang项目": "ParseGolangProject",
"代码重写为全英文_多线程": "RewriteCodeToEnglish_MultiThreaded",
"解析一个CSharp项目": "ParsingCSharpProject",
"删除所有本地对话历史记录": "DeleteAllLocalConversationHistoryRecords",
"批量Markdown翻译": "BatchTranslateMarkdown",
"连接bing搜索回答问题": "ConnectBingSearchAnswerQuestion",
"Langchain知识库": "LangchainKnowledgeBase",
"Latex输出PDF结果": "OutputPDFFromLatex",
"把字符太少的块清除为回车": "ClearBlocksWithTooFewCharactersToNewline",
"Latex精细分解与转化": "DecomposeAndConvertLatex",
"解析一个C项目的头文件": "ParseCProjectHeaderFiles",
@@ -46,7 +43,7 @@
"高阶功能模板函数": "HighOrderFunctionTemplateFunctions",
"高级功能函数模板": "AdvancedFunctionTemplate",
"总结word文档": "SummarizingWordDocuments",
"载入对话历史存档": "LoadConversationHistoryArchive",
"载入Conversation_To_File": "LoadConversationHistoryArchive",
"Latex中译英": "LatexChineseToEnglish",
"Latex英译中": "LatexEnglishToChinese",
"连接网络回答问题": "ConnectToNetworkToAnswerQuestions",
@@ -70,7 +67,6 @@
"读文章写摘要": "ReadArticleWriteSummary",
"生成函数注释": "GenerateFunctionComments",
"解析项目本身": "ParseProjectItself",
"对话历史存档": "ConversationHistoryArchive",
"专业词汇声明": "ProfessionalTerminologyDeclaration",
"解析docx": "ParseDocx",
"解析源代码新": "ParsingSourceCodeNew",
@@ -97,5 +93,37 @@
"多智能体": "MultiAgent",
"图片生成_DALLE2": "ImageGeneration_DALLE2",
"图片生成_DALLE3": "ImageGeneration_DALLE3",
"图片修改_DALLE2": "ImageModification_DALLE2"
"图片修改_DALLE2": "ImageModification_DALLE2",
"生成多种Mermaid图表": "GenerateMultipleMermaidCharts",
"知识库文件注入": "InjectKnowledgeBaseFiles",
"PDF翻译中文并重新编译PDF": "TranslatePDFToChineseAndRecompilePDF",
"随机小游戏": "RandomMiniGame",
"互动小游戏": "InteractiveMiniGame",
"解析历史输入": "ParseHistoricalInput",
"高阶功能模板函数示意图": "HighOrderFunctionTemplateDiagram",
"载入对话历史存档": "LoadChatHistoryArchive",
"对话历史存档": "ChatHistoryArchive",
"解析PDF_DOC2X_转Latex": "ParsePDF_DOC2X_toLatex",
"解析PDF_基于DOC2X": "ParsePDF_basedDOC2X",
"解析PDF_简单拆解": "ParsePDF_simpleDecomposition",
"解析PDF_DOC2X_单文件": "ParsePDF_DOC2X_singleFile",
"注释Python项目": "CommentPythonProject",
"注释源代码": "CommentSourceCode",
"log亮黄": "log_yellow",
"log亮绿": "log_green",
"log亮红": "log_red",
"log亮紫": "log_purple",
"log亮蓝": "log_blue",
"Rag问答": "RagQA",
"sprint红": "sprint_red",
"sprint绿": "sprint_green",
"sprint黄": "sprint_yellow",
"sprint蓝": "sprint_blue",
"sprint紫": "sprint_purple",
"sprint靛": "sprint_indigo",
"sprint亮红": "sprint_bright_red",
"sprint亮绿": "sprint_bright_green",
"sprint亮黄": "sprint_bright_yellow",
"sprint亮蓝": "sprint_bright_blue",
"sprint亮紫": "sprint_bright_purple"
}

查看文件

@@ -35,15 +35,15 @@
"总结word文档": "SummarizeWordDocument",
"解析ipynb文件": "ParseIpynbFile",
"解析JupyterNotebook": "ParseJupyterNotebook",
"对话历史存档": "ConversationHistoryArchive",
"载入对话历史存档": "LoadConversationHistoryArchive",
"Conversation_To_File": "ConversationHistoryArchive",
"载入Conversation_To_File": "LoadConversationHistoryArchive",
"删除所有本地对话历史记录": "DeleteAllLocalConversationHistoryRecords",
"Markdown英译中": "MarkdownEnglishToChinese",
"批量Markdown翻译": "BatchMarkdownTranslation",
"Markdown_Translate": "BatchMarkdownTranslation",
"批量总结PDF文档": "BatchSummarizePDFDocuments",
"批量总结PDF文档pdfminer": "BatchSummarizePDFDocumentsPdfminer",
"批量翻译PDF文档": "BatchTranslatePDFDocuments",
"批量翻译PDF文档_多线程": "BatchTranslatePdfDocumentsMultithreaded",
"PDF_Translate": "BatchTranslatePdfDocumentsMultithreaded",
"谷歌检索小助手": "GoogleSearchAssistant",
"理解PDF文档内容标准文件输入": "StandardFileInputForUnderstandingPdfDocumentContent",
"理解PDF文档内容": "UnderstandingPdfDocumentContent",
@@ -1468,7 +1468,7 @@
"交互功能模板函数": "InteractiveFunctionTemplateFunctions",
"交互功能函数模板": "InteractiveFunctionFunctionTemplates",
"Latex英文纠错加PDF对比": "LatexEnglishCorrectionWithPDFComparison",
"Latex输出PDF结果": "OutputPDFFromLatex",
"Latex_Function": "OutputPDFFromLatex",
"Latex翻译中文并重新编译PDF": "TranslateLatexToChineseAndRecompilePDF",
"语音助手": "VoiceAssistant",
"微调数据集生成": "FineTuneDatasetGeneration",

查看文件

@@ -3,7 +3,7 @@
## 1. 安装额外依赖
```
pip install --upgrade pyOpenSSL scipy git+https://github.com/aliyun/alibabacloud-nls-python-sdk.git
pip install --upgrade pyOpenSSL webrtcvad scipy git+https://github.com/aliyun/alibabacloud-nls-python-sdk.git
```
如果因为特色网络问题导致上述命令无法执行:

58
docs/use_tts.md 普通文件
查看文件

@@ -0,0 +1,58 @@
# 使用TTS文字转语音
## 1. 使用EDGE-TTS简单
将本项目配置项修改如下即可
```
TTS_TYPE = "EDGE_TTS"
EDGE_TTS_VOICE = "zh-CN-XiaoxiaoNeural"
```
## 2. 使用SoVITS需要有显卡
使用以下docker-compose.yml文件,先启动SoVITS服务API
1. 创建以下文件夹结构
```shell
.
├── docker-compose.yml
└── reference
├── clone_target_txt.txt
└── clone_target_wave.mp3
```
2. 其中`docker-compose.yml`为
```yaml
version: '3.8'
services:
gpt-sovits:
image: fuqingxu/sovits_gptac_trim:latest
container_name: sovits_gptac_container
working_dir: /workspace/gpt_sovits_demo
environment:
- is_half=False
- is_share=False
volumes:
- ./reference:/reference
ports:
- "19880:9880" # 19880 为 sovits api 的暴露端口,记住它
shm_size: 16G
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: "all"
capabilities: [gpu]
command: bash -c "python3 api.py"
```
3. 其中`clone_target_wave.mp3`为需要克隆的角色音频,`clone_target_txt.txt`为该音频对应的文字文本( https://wiki.biligame.com/ys/%E8%A7%92%E8%89%B2%E8%AF%AD%E9%9F%B3
4. 运行`docker-compose up`
5. 将本项目配置项修改如下即可
(19880 为 sovits api 的暴露端口,与docker-compose.yml中的端口对应)
```
TTS_TYPE = "LOCAL_SOVITS_API"
GPT_SOVITS_URL = "http://127.0.0.1:19880"
```
6. 启动本项目

某些文件未显示,因为此 diff 中更改的文件太多 显示更多