比较提交

..

735 次代码提交

作者 SHA1 备注 提交日期
binary-husky
491174095a 更新docker-compose说明 2023-10-07 11:59:06 +08:00
binary-husky
49cea97822 启动主题自动转换 2023-10-06 10:36:30 +08:00
binary-husky
6310b65d70 重新编译Gradio优化使用体验 2023-10-06 10:32:03 +08:00
binary-husky
93c76e1809 更新内置gradio版本 2023-10-06 09:54:07 +08:00
binary-husky
f64cf7a3d1 update translation matrix 2023-10-02 14:24:01 +08:00
binary-husky
fdffbee1b0 Update toolbox.py 2023-09-30 09:56:30 +08:00
binary-husky
87ccd1a89a Update crazy_functional.py 2023-09-27 18:35:06 +08:00
binary-husky
87b9734986 修复'copiedIcon'重复定义BUG 2023-09-27 16:35:58 +08:00
binary-husky
d2d5665c37 允许模块预热时使用Proxy 2023-09-27 15:53:45 +08:00
binary-husky
0844b6e9cf GROBID服务代理访问支持 2023-09-27 15:40:55 +08:00
binary-husky
9cb05e5724 修改布局 2023-09-27 15:20:28 +08:00
binary-husky
80b209fa0c Merge branch 'frontier' 2023-09-27 15:19:07 +08:00
binary-husky
8d4cb05738 Matlab项目解析插件的Shortcut 2023-09-26 10:16:38 +08:00
binary-husky
31f4069563 改善润色和校读Prompt 2023-09-25 17:46:28 +08:00
binary-husky
8ba6fc062e Merge branch 'frontier' of github.com:binary-husky/chatgpt_academic into frontier 2023-09-23 23:59:30 +08:00
binary-husky
c0c2d14e3d better scrollbar 2023-09-23 23:58:32 +08:00
binary-husky
f0a5c49a9c Merge branch 'frontier' of github.com:binary-husky/chatgpt_academic into frontier 2023-09-23 23:47:42 +08:00
binary-husky
9333570ab7 减小重置等基础按钮的最小大小 2023-09-23 23:47:25 +08:00
binary-husky
d6eaaad962 禁止gradio显示误导性的share=True 2023-09-23 23:23:23 +08:00
binary-husky
e24f077b68 显式增加azure-gpt-4选项 2023-09-23 23:06:58 +08:00
binary-husky
dc5bb9741a 版本更新 2023-09-23 22:45:07 +08:00
binary-husky
b383b45191 version 3.54 beta 2023-09-23 22:44:18 +08:00
binary-husky
2d8f37baba 细分代理场景 2023-09-23 22:43:15 +08:00
binary-husky
409927ef8e 统一 transformers 版本 2023-09-23 22:26:28 +08:00
binary-husky
5b231e0170 添加整体复制按钮 2023-09-23 22:11:29 +08:00
binary-husky
87f629bb37 添加gpt-4-32k 2023-09-23 20:24:13 +08:00
binary-husky
3672c97a06 动态代码解释器 2023-09-23 01:51:05 +08:00
binary-husky
b6ee3e9807 Merge pull request #1121 from binary-husky/frontier
arxiv翻译插件添加禁用缓存选项
2023-09-21 09:33:19 +08:00
binary-husky
d56bc280e9 添加禁用缓存选项 2023-09-20 22:04:15 +08:00
qingxu fu
d5fd00c15d 微调Dockerfile 2023-09-20 10:02:10 +08:00
binary-husky
5e647ff149 Merge branch 'master' into frontier 2023-09-19 17:21:02 +08:00
binary-husky
868faf00cc 修正docker compose 2023-09-19 17:10:57 +08:00
binary-husky
a0286c39b9 更新README 2023-09-19 17:08:20 +08:00
binary-husky
9cced321f1 修改README 2023-09-19 16:55:39 +08:00
binary-husky
3073935e24 修改readme 推送version 3.53 2023-09-19 16:49:33 +08:00
binary-husky
ef6631b280 TOKEN_LIMIT_PER_FRAGMENT修改为1024 2023-09-19 16:31:36 +08:00
binary-husky
0801e4d881 Merge pull request #1111 from kaixindelele/only_chinese_pdf
提升PDF翻译插件的效果
2023-09-19 15:56:04 +08:00
qingxu fu
ae08cfbcae 修复小Bug 2023-09-19 15:55:27 +08:00
qingxu fu
1c0d5361ea 调整状态栏的最小高度 2023-09-19 15:52:42 +08:00
qingxu fu
278464bfb7 合并重复的函数 2023-09-18 23:03:23 +08:00
qingxu fu
2a6996f5d0 修复Azure的ENDPOINT格式兼容性 2023-09-18 21:19:02 +08:00
qingxu fu
84b11016c6 在nougat处理结束后,同时输出mmd文件 2023-09-18 15:21:30 +08:00
qingxu fu
7e74d3d699 调整按钮位置 2023-09-18 15:19:21 +08:00
qingxu fu
2cad8e2694 支持动态切换主题 2023-09-17 00:15:28 +08:00
qingxu fu
e765ec1223 dynamic theme 2023-09-17 00:02:49 +08:00
kaixindelele
471a369bb8 论文翻译只输出中文 2023-09-16 22:09:44 +08:00
binary-husky
760ff1840c 修复一个循环的Bug 2023-09-15 17:08:23 +08:00
binary-husky
9905122fc2 修复Tex文件匹配BUG 2023-09-15 12:55:41 +08:00
binary-husky
abea0d07ac 修复logging的Bug 2023-09-15 11:00:30 +08:00
binary-husky
16ff5ddcdc 版本3.52 2023-09-14 23:07:12 +08:00
binary-husky
1c4cb340ca 修复滞留文档的提示Bug 2023-09-14 22:45:45 +08:00
binary-husky
5ba8ea27d1 用logging取代print 2023-09-14 22:33:07 +08:00
binary-husky
567c6530d8 增加NOUGAT消息提示和错误操作提示 2023-09-14 21:38:47 +08:00
binary-husky
a3f36668a8 修复latex识别主文件错误的问题 2023-09-14 17:51:41 +08:00
binary-husky
a1cc2f733c 修复nougat线程锁释放Bug 2023-09-14 15:26:03 +08:00
binary-husky
0937f37388 Predict按钮参数修正 2023-09-14 11:02:40 +08:00
binary-husky
74f35e3401 针对虚空终端个别情况下不输出文件的问题进行提示 2023-09-14 01:51:55 +08:00
binary-husky
ab7999c71a 修正本项目源码范围 2023-09-14 01:00:38 +08:00
binary-husky
544771db9a 隐藏历史对话绝对路径 2023-09-14 00:53:15 +08:00
binary-husky
ec9d030457 把上传文件路径和日志路径修改为统一可配置的变量 2023-09-14 00:51:25 +08:00
binary-husky
14de282302 给nougat加线程锁 合并冗余代码 2023-09-13 23:21:00 +08:00
binary-husky
fb5467b85b 更新插件系统提示 2023-09-12 19:13:36 +08:00
binary-husky
c4c6465927 解决issues #1097 2023-09-12 18:57:50 +08:00
qingxu fu
99a1cd6f9f 添加pypinyin依赖 2023-09-12 12:20:05 +08:00
qingxu fu
7e73a255f4 修改知识库插件的提示信息 2023-09-12 11:47:34 +08:00
qingxu fu
4b5f13bff2 修复知识库的依赖问题 2023-09-12 11:35:31 +08:00
qingxu fu
d495b73456 支持更多UI皮肤外观,加入暗色亮色切换键 2023-09-11 22:55:32 +08:00
qingxu fu
e699b6b13f Merge branch 'master' of https://github.com/binary-husky/chatgpt_academic into master 2023-09-11 14:49:37 +08:00
qingxu fu
eb150987f0 兼容一个one-api没有done数据包的第三方Bug情形 2023-09-11 14:49:30 +08:00
binary-husky
34784333dc 融合PDF左右比例调整到95% 2023-09-10 17:22:35 +08:00
binary-husky
28d777a96b 修正报错消息 2023-09-10 16:52:35 +08:00
qingxu fu
c45fa88684 update translation matrix 2023-09-09 21:57:24 +08:00
binary-husky
ad9807dd14 更新虚空终端的提示 2023-09-09 20:32:44 +08:00
binary-husky
2a51715075 修复Dockerfile 2023-09-09 20:15:46 +08:00
binary-husky
7c307d8964 修复源代码解析模块与虚空终端的兼容性 2023-09-09 19:33:05 +08:00
binary-husky
baaacc5a7b Update README.md 2023-09-09 19:11:21 +08:00
binary-husky
6faf5947c9 Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2023-09-09 18:30:59 +08:00
binary-husky
571335cbc4 fix docker file 2023-09-09 18:30:43 +08:00
binary-husky
7d5abb6d69 Merge pull request #1077 from jsz14897502/master
更改谷歌学术搜索助手获取摘要的逻辑
2023-09-09 18:24:30 +08:00
binary-husky
a0f592308a Merge branch 'master' into jsz14897502-master 2023-09-09 18:22:29 +08:00
binary-husky
e512d99879 添加一定的延迟,防止触发反爬虫机制 2023-09-09 18:22:22 +08:00
binary-husky
e70b636513 修复数学公式判定的Bug 2023-09-09 17:50:38 +08:00
binary-husky
408b8403fe Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2023-09-08 12:10:22 +08:00
binary-husky
74f8cb3511 update dockerfile 2023-09-08 12:10:16 +08:00
qingxu fu
2202cf3701 remove proxy message 2023-09-08 11:11:53 +08:00
qingxu fu
cce69beee9 update error message 2023-09-08 11:08:02 +08:00
qingxu fu
347124c967 update scipdf_parser dep 2023-09-08 10:43:20 +08:00
qingxu fu
77a6105a9a 修改demo案例 2023-09-08 09:52:29 +08:00
qingxu fu
13c9606af7 修正下载PDF失败时产生的错误提示 2023-09-08 09:47:29 +08:00
binary-husky
bac6810e75 修改操作提示 2023-09-08 09:38:16 +08:00
binary-husky
c176187d24 修复因为函数返回值导致的不准确错误提示 2023-09-07 23:46:54 +08:00
binary-husky
31d5ee6ccc Update README.md 2023-09-07 23:05:54 +08:00
binary-husky
5e0dc9b9ad 修复PDF下载路径时间戳的问题 2023-09-07 18:51:09 +08:00
binary-husky
4c6f3aa427 CodeInterpreter 2023-09-07 17:45:44 +08:00
binary-husky
d7331befc1 add note 2023-09-07 17:42:47 +08:00
binary-husky
63219baa21 修正语音对话时 句子末尾显示异常的问题 2023-09-07 17:04:40 +08:00
binary-husky
97cb9a4adc full capacity docker file 2023-09-07 15:09:38 +08:00
binary-husky
24f41b0a75 new docker file 2023-09-07 00:45:03 +08:00
binary-husky
bfec29e9bc new docker file 2023-09-07 00:43:31 +08:00
binary-husky
dd9e624761 add new dockerfile 2023-09-07 00:40:11 +08:00
binary-husky
7855325ff9 update dockerfiles 2023-09-06 23:33:15 +08:00
binary-husky
2c039ff5c9 add session 2023-09-06 22:19:32 +08:00
binary-husky
9a5ee86434 Merge pull request #1084 from eltociear/patch-2
Update README.md
2023-09-06 21:56:39 +08:00
binary-husky
d6698db257 nougat翻译PDF论文 2023-09-06 15:32:11 +08:00
Ikko Eltociear Ashimine
b2d03bf2a3 Update README.md
arbitary -> arbitrary
2023-09-06 15:30:12 +09:00
binary-husky
2f83b60fb3 添加搜索失败时的提示 2023-09-06 12:36:59 +08:00
binary-husky
d183e34461 添加一个全版本搜索的开关 2023-09-06 11:42:29 +08:00
binary-husky
fb78569335 Merge branch 'master' of https://github.com/jsz14897502/gpt_academic into jsz14897502-master 2023-09-06 10:27:52 +08:00
qingxu fu
12c8cd75ee Merge branch 'master' of https://github.com/binary-husky/chatgpt_academic into master 2023-09-06 10:24:14 +08:00
qingxu fu
0e21e3e2e7 修复没填写讯飞APPID无报错提示的问题 2023-09-06 10:24:11 +08:00
binary-husky
fda1e87278 Update stale.yml 2023-09-06 10:19:21 +08:00
binary-husky
1092031d77 Create stale.yml 2023-09-06 10:15:52 +08:00
binary-husky
f0482d3bae Update docker-compose.yml 2023-09-04 12:39:25 +08:00
binary-husky
b6ac3d0d6c Update README.md 2023-09-04 12:34:55 +08:00
binary-husky
3344ffcb8b Update README.md 2023-09-04 11:41:52 +08:00
binary-husky
82936f71b6 Update README.md 2023-09-04 11:37:47 +08:00
binary-husky
51e809c09e Update README.md 2023-09-04 11:34:46 +08:00
qingxu fu
713df396dc Merge branch 'master' of https://github.com/binary-husky/chatgpt_academic into master 2023-09-03 16:46:30 +08:00
qingxu fu
23a42d93df update translation matrix 2023-09-03 16:46:27 +08:00
binary-husky
0ef06683dc Update README.md 2023-09-03 16:35:03 +08:00
qingxu fu
843113ba0f fix minor bugs 2023-09-03 16:20:05 +08:00
binary-husky
79080290c6 Merge pull request #1074 from Kilig947/plugin_classification
插件分区新增插件分类选择
2023-09-03 15:41:45 +08:00
qingxu fu
9bd2023a8e revise version check 2023-09-03 15:40:41 +08:00
qingxu fu
0d6e32d31a version 3.5 release 2023-09-03 15:38:10 +08:00
qingxu fu
0418257218 Merge branch 'master' into Kilig947-plugin_classification 2023-09-03 15:35:16 +08:00
qingxu fu
a3e6fc0141 修复文心一言的接口问题 2023-09-03 15:32:39 +08:00
qingxu fu
1dd165a3cd ui layout improve 2023-09-03 14:47:22 +08:00
qingxu fu
e666b5269e 改进虚空终端 2023-09-03 00:53:57 +08:00
qingxu fu
0b70e9df7b 优化虚空终端调用流程 2023-09-02 23:49:56 +08:00
qingxu fu
1639796041 support file implementation 2023-09-02 22:22:41 +08:00
jsz14
03164bcb6f fix:没有获取到所有版本时的处理 2023-09-02 19:58:24 +08:00
qingxu fu
d0af074225 change layout 2023-09-02 18:19:19 +08:00
binary-husky
6d7f3feab3 优化主题外观,新增high-contrast主题 2023-09-01 10:45:22 +08:00
binary-husky
045b7f6312 Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2023-09-01 10:34:33 +08:00
binary-husky
116b7ce12f 支持星火认知大模型v2 2023-09-01 10:34:26 +08:00
qingxu fu
8b0905c076 提高虚空终端的成功率 2023-08-31 18:04:31 +08:00
qingxu fu
b69140307b 修复对话框对齐的问题 2023-08-31 16:24:00 +08:00
qingxu fu
b31abbcad3 每个插件可以归属多个Group 2023-08-31 15:59:19 +08:00
qingxu fu
2d5a1fbc12 修改前端代码 2023-08-31 00:21:24 +08:00
jsz14
d052d425af 更改谷歌学术搜索助手获取摘要的逻辑 2023-08-30 19:14:01 +08:00
qingxu fu
89de49f31e 修改变量命名,整理配置清单 2023-08-30 16:00:27 +08:00
w_xiaolizu
a208782049 新增插件分类 2023-08-30 14:46:34 +08:00
qingxu fu
eb802ee975 implement two stage plugin selection 2023-08-29 23:53:47 +08:00
qingxu fu
f40d48b014 fix typing problems 2023-08-29 23:46:40 +08:00
qingxu fu
ef4203f5ca Merge branch 'master' of https://github.com/binary-husky/chatgpt_academic into master 2023-08-29 23:25:10 +08:00
qingxu fu
adf93195e8 尝试使用自然语言调度各个插件 2023-08-29 23:25:06 +08:00
binary-husky
3e5cdbaf68 Update README.md 2023-08-29 18:29:45 +08:00
binary-husky
27cab3b38a Update README.md 2023-08-29 18:29:16 +08:00
qingxu fu
09d38e4abf 出于安全性考虑,默认禁用动态配置修改 2023-08-29 17:50:45 +08:00
qingxu fu
7efb5cb6f5 移除早期引入的测试样本 2023-08-29 17:43:55 +08:00
qingxu fu
31ff6e1e7a 支持自然语言修改项目本身的配置 2023-08-29 17:37:41 +08:00
qingxu fu
2fa3d47887 fix json read error 2023-08-29 12:42:06 +08:00
binary-husky
2cca46375c Update crazy_functional.py 2023-08-28 17:47:37 +08:00
binary-husky
06410b593c Update config.py 2023-08-28 16:16:30 +08:00
binary-husky
545c9f47de Update README.md 2023-08-28 11:59:23 +08:00
binary-husky
973ad41bde add a space 2023-08-28 02:03:30 +08:00
binary-husky
3fa7416eb2 notify dummy action 2023-08-28 01:56:15 +08:00
binary-husky
ec76d3dcc4 支持借助GROBID实现PDF高精度翻译 2023-08-28 01:25:44 +08:00
binary-husky
3f27bec94b Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2023-08-28 01:22:26 +08:00
binary-husky
ed11269aef 支持借助GROBID实现PDF高精度翻译 2023-08-28 01:22:20 +08:00
qingxu fu
6c653734ec Fix 3rd part chatgpt compat 2023-08-26 17:57:59 +08:00
qingxu fu
19bd0c35ed 修复latex input命令解析问题 2023-08-25 21:20:15 +08:00
binary-husky
3f4c4ebc29 调整注释 2023-08-25 13:16:18 +08:00
binary-husky
6cc7d4ed69 修复文心一言最大文本长度限制带来的问题 2023-08-25 13:09:08 +08:00
binary-husky
67fff17917 3.49 接入百度千帆平台和文心一言 2023-08-25 12:45:08 +08:00
binary-husky
8fce49fa02 支持百度云千帆和文心一言 2023-08-25 12:31:51 +08:00
binary-husky
30f28b37c3 Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2023-08-21 22:09:05 +08:00
binary-husky
6a5681dd0a add llama2 2023-08-21 22:08:57 +08:00
binary-husky
dacc282763 Update README.md 2023-08-21 22:00:51 +08:00
binary-husky
9720bec5e5 Interface with LLaMa2 from huggingface 2023-08-21 21:54:21 +08:00
binary-husky
8b3b883fce Update README.md 2023-08-17 10:02:55 +08:00
qingxu fu
4dc0f8e57a 修改dockercompose,添加对阿里qwen的支持 2023-08-17 10:00:42 +08:00
qingxu fu
5e48fc98ed 添加本地缓存删除功能 2023-08-16 22:49:46 +08:00
qingxu fu
2ff8dc787e interface with ChatGPT-to-API 2023-08-16 22:21:51 +08:00
qingxu fu
cd38d1697c fix missing finish_reason problem 2023-08-16 21:40:34 +08:00
qingxu fu
00f63cb0bc configure utf8 encoding 2023-08-16 21:29:16 +08:00
binary-husky
dc7fab3c19 Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2023-08-14 17:27:33 +08:00
binary-husky
d1b5359e2b fix github action 2023-08-14 17:27:13 +08:00
binary-husky
0597ffea2e Update README.md 2023-08-14 16:37:07 +08:00
binary-husky
d16329c1af resolve sparkapi on_close error 2023-08-14 11:31:05 +08:00
binary-husky
d5b4d7ab90 better github action 2023-08-14 11:28:52 +08:00
binary-husky
8199a9a12e Update requirements.txt 2023-08-14 11:23:15 +08:00
binary-husky
cb10a8abec Update requirements.txt 2023-08-14 10:54:46 +08:00
binary-husky
0dbcda89b7 add websocket dep 2023-08-14 10:32:31 +08:00
binary-husky
78a8259b82 Update bridge_all.py 2023-08-14 10:24:59 +08:00
binary-husky
f22fdb4f94 Merge pull request #1040 from Keldos-Li/fix-Chuanhu-theme
调整与修复 [川虎小而美] 主题样式
2023-08-14 10:08:01 +08:00
binary-husky
450645a9d0 version 3.48 2023-08-14 03:09:56 +08:00
binary-husky
af23730f8f 接入讯飞星火Spark大模型 2023-08-14 03:08:15 +08:00
Keldos
0b11260d6f fix: 修复川虎主题的slider问题 2023-08-14 00:15:38 +08:00
Keldos
31ab97dd09 feat: 调整川虎主题样式 2023-08-14 00:14:44 +08:00
binary-husky
c0c4834cfc fix interact message 2023-08-13 22:25:01 +08:00
binary-husky
2dae40f4ba Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2023-08-13 21:34:33 +08:00
binary-husky
587c7400d1 xunfei spark api test 2023-08-13 21:34:27 +08:00
binary-husky
8dd2e2a6b7 Update bug_report.yml 2023-08-13 21:25:21 +08:00
binary-husky
aaf4f37403 Merge pull request #1014 from hongyi-zhao/master
Fix the reverse proxy based OpenAI access via https://github.com/acheong08/ChatGPT-to-API/.
2023-08-13 20:57:32 +08:00
binary-husky
3e2e81a968 add chatgpt website 2023-08-13 20:55:18 +08:00
binary-husky
cc1be5585b Merge branch 'master' of https://github.com/hongyi-zhao/gpt_academic into hongyi-zhao-master 2023-08-13 20:50:09 +08:00
binary-husky
5050016b22 theme typo fix 2023-08-12 20:28:20 +08:00
binary-husky
7662196514 update tests 2023-08-12 14:09:19 +08:00
binary-husky
8ddaca09e0 add commandline helper 2023-08-12 12:11:49 +08:00
binary-husky
71c692dcef Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2023-08-07 02:11:57 +08:00
binary-husky
184e417fec handle local llm dependency error properly 2023-08-07 02:11:48 +08:00
binary-husky
7a99560183 Update README.md 2023-08-07 02:01:35 +08:00
binary-husky
48f4d6aa2a Update README.md 2023-08-07 02:00:39 +08:00
binary-husky
c17fc2a9b5 我是来自达摩院的大规模语言模型,我叫通义千问。 2023-08-07 01:58:35 +08:00
binary-husky
4d70b3786f interface with qwen 2023-08-07 01:24:41 +08:00
binary-husky
9bee676cd2 Merge pull request #1009 from ValeriaWong/master
feat(chatglm_int8_onnx):纯CPU推理,最多仅需8GB内存,推理速度未测评,token数有限,暂时还不能流式输出 #…
2023-08-07 01:13:09 +08:00
binary-husky
0a37106692 reverse cmd_to_install 2023-08-07 01:11:44 +08:00
binary-husky
57d4541d4e fix minor bug in chatglm-onnx 2023-08-07 01:07:55 +08:00
binary-husky
d7dd586f09 introduce unified base class for local llm models 2023-08-07 00:57:52 +08:00
binary-husky
b6b53ce2a4 Merge branch 'master' of https://github.com/ValeriaWong/chatgpt_academic into ValeriaWong-master 2023-08-06 22:17:52 +08:00
505030475
43809c107d update multi-language module 2023-08-04 23:53:23 +08:00
505030475
1721edc990 Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2023-08-04 23:30:00 +08:00
Hongyi Zhao
bfb7aab4a0 Fix the reverse proxy based OpenAI access via https://github.com/acheong08/ChatGPT-to-API/.
See https://github.com/binary-husky/gpt_academic/issues/900#issuecomment-1658463065 for more detailed discussions.
2023-08-02 18:03:49 +08:00
binary-husky
f4a87d6380 Update README.md 2023-08-01 12:54:50 +08:00
ValeriaWong
c0c337988f feat(chatglm_int8_onnx):纯CPU推理,最多仅需8GB内存,推理速度未测评,token数有限,暂时还不能流式输出 #1008 2023-08-01 00:48:57 +08:00
binary-husky
27f65c251a Update 图片生成.py 2023-07-31 15:57:18 +08:00
qingxu fu
87f099f740 use get_log_folder() to manage log folder - step 1 2023-07-31 12:28:32 +08:00
qingxu fu
484f16e365 修复空输入触发的BUG 2023-07-31 12:08:07 +08:00
qingxu fu
37afcc709b interface with void terminal 2023-07-31 11:20:01 +08:00
binary-husky
9cbe9f240d Update README.md 2023-07-30 14:08:21 +08:00
binary-husky
f6567c02f6 update translation matrix for japanese and t-zh 2023-07-30 13:58:11 +08:00
binary-husky
8c83061a93 more explaination 2023-07-30 13:51:21 +08:00
binary-husky
23f2adfdc3 update translation matrix 2023-07-30 13:44:11 +08:00
binary-husky
61698444b1 change comments 2023-07-30 13:36:34 +08:00
binary-husky
109afcf8f6 Merge remote-tracking branch 'origin/enable_clear_history_option' 2023-07-30 13:27:10 +08:00
binary-husky
19ef6a530a add additonal source for checking proxy ip 2023-07-30 13:23:35 +08:00
binary-husky
e08bd9669e increase audio assistant watch dog patience 2023-07-30 12:48:43 +08:00
binary-husky
155a7e1174 Merge pull request #998 from awwaawwa/enable_clear_history_option
增加自动清除历史消息时的提示
2023-07-28 21:10:31 +08:00
binary-husky
86e33ea99a Update core_functional.py 2023-07-28 21:09:51 +08:00
qingxu fu
524684f8bd fix the markdown translation functionality 2023-07-28 21:03:20 +08:00
qingxu fu
2a362cec84 markdown translation handle github index page 2023-07-28 20:20:30 +08:00
505030475
2747c23868 Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2023-07-28 10:35:50 +08:00
binary-husky
f446dbb62d Update README.md 2023-07-28 09:54:03 +08:00
binary-husky
8d37d94e2c Update README.md 2023-07-28 09:53:17 +08:00
awwaawwa
e4ba0e6c85 add clear history tips 2023-07-27 23:07:59 +08:00
505030475
4216c5196e verify ignore history practice 2023-07-27 22:30:55 +08:00
binary-husky
2df660a718 Merge pull request #992 from yangchuansheng/master
Update README.md
2023-07-26 22:46:43 +08:00
binary-husky
bb496a9c2c Update README.md 2023-07-26 22:46:21 +08:00
binary-husky
4e0737c0c2 Update README.md 2023-07-26 22:46:02 +08:00
binary-husky
4bb3cba5c8 Update README.md 2023-07-26 18:53:42 +08:00
qingxu fu
08b9b0d140 improve audio assistant documents 2023-07-26 18:51:33 +08:00
qingxu fu
3577a72a3b add audio assistant docker compose solution 2023-07-26 18:39:32 +08:00
qingxu fu
0328d6f498 add ALIYUN ACCESSKEY SECRET 2023-07-26 18:28:15 +08:00
qingxu fu
d437305a4f add audio assistant docker 2023-07-26 18:16:59 +08:00
qingxu fu
c4899bcb20 long-term aliyun access 2023-07-26 18:09:28 +08:00
Carson Yang
4295764f8c Update README.md
添加 Sealos 部署方案
2023-07-25 16:38:37 +08:00
binary-husky
e4e2430255 version 3.47 2023-07-24 19:58:47 +08:00
binary-husky
1732127a28 Merge pull request #979 from fenglui/master
增加chatGLM int4配置支持 小显存也可以选择chatGLM
2023-07-24 19:52:27 +08:00
binary-husky
56bb8b6498 improve re efficiency 2023-07-24 18:50:29 +08:00
binary-husky
e93b6fa3a6 Add GLM INT8 2023-07-24 18:19:57 +08:00
binary-husky
dd4ba0ea22 Merge branch 'master' of https://github.com/fenglui/gpt_academic into fenglui-master 2023-07-24 18:06:15 +08:00
binary-husky
c2701c9ce5 Merge pull request #986 from one-pr/git-clone
默认仅 clone 最新的代码,减小 git clone 的大小
2023-07-24 17:48:35 +08:00
woclass
2f019ce359 优化 README.md 中的其他 git clone 2023-07-24 15:14:48 +08:00
woclass
c5b147aeb7 默认仅 clone 最新的代码,减小 git clone 的大小 2023-07-24 15:14:42 +08:00
fenglui
5813d65e52 增加chatGLM int4配置支持 小显存也可以选择chatGLM 2023-07-22 08:29:15 +08:00
binary-husky
a393edfaa4 ALLOW CUSTOM API KEY PATTERN 2023-07-21 22:49:07 +08:00
binary-husky
dd7a01cda5 Merge pull request #976 from fenglui/master
fix msg.data.split(DELIMITER) exception when msg.data is int
2023-07-21 17:02:29 +08:00
fenglui
00a3b91f95 fix msg.data.split(DELIMITER) exception when msg.data is int 2023-07-21 03:51:33 +08:00
qingxu fu
61ba544282 add latex test samples 2023-07-20 19:49:23 +08:00
qingxu fu
b5b8c123e4 latex plugin stability improvement 2023-07-20 19:39:22 +08:00
qingxu fu
d9ceba959f expand range after failure 2023-07-20 18:39:02 +08:00
qingxu fu
6b5b040701 remove pdf merge 2023-07-20 18:29:06 +08:00
qingxu fu
4f4c09a5f3 增强Latex修复能力 2023-07-20 18:08:22 +08:00
qingxu fu
067bc97cce Merge branch 'interface-interlm' of https://github.com/binary-husky/chatgpt_academic into interface-interlm 2023-07-20 12:46:52 +08:00
qingxu fu
7368580cd6 concat pdf after translation 2023-07-20 12:46:48 +08:00
binary-husky
df90db210c Merge branch 'master' into interface-interlm 2023-07-20 11:40:45 +08:00
binary-husky
0927ed20a2 edit default configuration 2023-07-20 11:39:35 +08:00
binary-husky
73b22f85be compat third party gpt error handle 2023-07-20 11:09:22 +08:00
binary-husky
b8d77557b0 Update README.md 2023-07-20 10:12:42 +08:00
binary-husky
99b8fce8f3 Merge pull request #965 from QQisQQ/patch-2
解决new bing 报错200 (fix new bing error code 200 )
2023-07-19 10:15:15 +08:00
binary-husky
16364f1b2d Merge pull request #966 from doujiang-zheng/master
Add timestamp for chat_secrets.log and disable the verbose httpx log.
2023-07-19 10:14:36 +08:00
doujiang-zheng
3b88e00cfb Add timestamp for chat_secrets.log and disable the verbose httpx log. 2023-07-19 09:43:59 +08:00
QQisQQ
0c8c539e9b 解决new bing 报错200 (fix new bing error code 200 )
modify from 16e00af9d5

works for my issue:
```
Traceback (most recent call last):
  File "./request_llm/bridge_newbingfree.py", line 152, in run
    asyncio.run(self.async_run())
  File "/root/miniconda3/envs/py311/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/py311/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/py311/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "./request_llm/bridge_newbingfree.py", line 98, in async_run
    async for final, response in self.newbing_model.ask_stream(
  File "./request_llm/edge_gpt_free.py", line 676, in ask_stream
    async for response in self.chat_hub.ask_stream(
  File "./request_llm/edge_gpt_free.py", line 456, in ask_stream
    self.wss = await self.session.ws_connect(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/py311/lib/python3.11/site-packages/aiohttp/client.py", line 795, in _ws_connect
    raise WSServerHandshakeError(
aiohttp.client_exceptions.WSServerHandshakeError: 200, message='Invalid response status', url=URL('wss://sydney.bing.com/sydney/ChatHub')
```
2023-07-19 04:39:15 +08:00
binary-husky
fd549fb986 merge success 2023-07-18 19:51:13 +08:00
binary-husky
babb775cfb interface with interlm 2023-07-18 16:33:34 +08:00
qingxu fu
eef9e470c9 Latex解除非UTF8编码错误 2023-07-18 11:00:20 +08:00
binary-husky
3002c6318a Update README.md 2023-07-17 22:21:39 +08:00
binary-husky
6d0bceaebd 移除插件依赖 2023-07-17 22:00:29 +08:00
binary-husky
aa51d6fde6 up 2023-07-17 21:54:28 +08:00
binary-husky
136479e218 Update README.md 2023-07-17 10:38:46 +08:00
binary-husky
19a2742354 Merge pull request #957 from 1Haschwalth/patch-1
Update README.md
2023-07-17 10:35:15 +08:00
1Haschwalth
45aac96dd3 Update README.md 2023-07-16 21:50:08 +08:00
binary-husky
6f21ae8939 support claude api 2023-07-16 15:03:05 +08:00
binary-husky
add98f4eeb 修复自动版本升级Bug 2023-07-16 13:23:28 +08:00
binary-husky
fe231f72b6 fix theme folder rename problem 2023-07-16 13:15:55 +08:00
binary-husky
b308fde480 update readme 2023-07-15 19:19:39 +08:00
binary-husky
f3e14ff806 更新繁體中文映射詞典 2023-07-15 19:11:00 +08:00
binary-husky
79ef9bdf1c update English projection dictionary 2023-07-15 19:01:49 +08:00
binary-husky
a3e938aee9 Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2023-07-15 18:41:46 +08:00
binary-husky
b19a6155f4 restore jittor support 2023-07-15 18:41:35 +08:00
binary-husky
801f7342b1 Update config.py 2023-07-15 17:58:34 +08:00
binary-husky
4829fa0f35 Update README.md 2023-07-15 17:46:19 +08:00
binary-husky
3671f4208e Update README.md 2023-07-15 17:39:04 +08:00
binary-husky
e8c51181ee 进一步提高语音识别的实时性 2023-07-15 17:02:00 +08:00
binary-husky
3ccbb4d6fb 移除google字体 2023-07-15 17:01:37 +08:00
binary-husky
93fe457e99 Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2023-07-15 16:41:46 +08:00
binary-husky
afac657aaa 解决语音助手看门狗线程泄露的问题 2023-07-15 16:41:11 +08:00
binary-husky
3e5c32860a Update README.md 2023-07-15 14:59:05 +08:00
binary-husky
d577bb38b6 Update use_audio.md 2023-07-15 14:58:27 +08:00
binary-husky
418bc32b39 Update use_audio.md 2023-07-15 14:53:30 +08:00
binary-husky
7148ea0596 更新README 2023-07-15 14:44:07 +08:00
binary-husky
87adb17df4 3.46 2023-07-15 14:38:18 +08:00
binary-husky
3fcee3762d 微调样式 2023-07-15 14:35:24 +08:00
binary-husky
1f014779e4 微调样式 2023-07-15 14:31:38 +08:00
binary-husky
97879e73ef 恢复横向调整css 2023-07-15 13:35:11 +08:00
binary-husky
13d4cd3237 音频功能说明书 2023-07-15 13:30:12 +08:00
binary-husky
73e835885b Merge branch 'master' into improve_ui_master 2023-07-15 13:01:13 +08:00
binary-husky
2524c908fc 修改提示 2023-07-15 12:58:38 +08:00
binary-husky
0e71d81bb3 Update README.md 2023-07-14 16:30:03 +08:00
binary-husky
a47864888f Update build-with-latex.yml 2023-07-14 16:25:25 +08:00
binary-husky
9b61ac807c Update build-with-chatglm.yml 2023-07-14 16:25:03 +08:00
binary-husky
bc200dc555 Update build-without-local-llms.yml 2023-07-14 16:24:32 +08:00
binary-husky
2c18b84517 修复依赖自动安装程序 2023-07-12 22:16:25 +08:00
qingxu fu
fe7b651c56 更新提示 2023-07-11 15:56:28 +08:00
qingxu fu
9b8f160788 up 2023-07-11 15:52:38 +08:00
binary-husky
801d5e2fc2 audio readme 2023-07-11 11:11:06 +08:00
binary-husky
cecdd28e04 Update README.md 2023-07-10 03:41:19 +08:00
binary-husky
d364df1cd6 add test instance 2023-07-10 03:33:51 +08:00
binary-husky
f51bc03686 3.45版本说明 2023-07-10 03:24:34 +08:00
binary-husky
c010d50716 允许加入ChatGLM微调模型 2023-07-10 03:17:09 +08:00
binary-husky
acddb86f3a 小而美 2023-07-10 00:20:14 +08:00
binary-husky
4fde0120ab 完善提醒 2023-07-10 00:08:59 +08:00
binary-husky
592a354eef 完善插件提示 2023-07-10 00:06:48 +08:00
binary-husky
bd66cf3d8b 修复对话历史的问题 2023-07-10 00:02:22 +08:00
binary-husky
e6e5174734 改名 2023-07-09 23:47:10 +08:00
binary-husky
13ade82677 改善语音辅助 2023-07-09 23:18:06 +08:00
binary-husky
ce9eb8d20a UP 2023-07-09 21:18:04 +08:00
binary-husky
dd47c0a284 merge changes 2023-07-09 20:55:37 +08:00
binary-husky
f725ab1b31 Merge branch 'master' into improve_ui_master 2023-07-09 20:47:53 +08:00
binary-husky
7ce4192c52 add comments 2023-07-09 17:25:50 +08:00
binary-husky
c06aafb642 Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2023-07-09 16:01:15 +08:00
binary-husky
b298c5416c 完善PDF总结插件 2023-07-09 16:01:08 +08:00
505030475
94abf302cb 修正模板注释 2023-07-09 12:50:51 +08:00
binary-husky
fcc5534e66 ChatGLM 黑盒微调插件 2023-07-09 03:37:47 +08:00
binary-husky
56c0e4d575 3.44说明 2023-07-09 01:21:18 +08:00
binary-husky
8a10db618e Merge branch 'master-interact' 2023-07-09 01:05:04 +08:00
binary-husky
1fe66f0291 优化azure的体验 2023-07-09 00:20:58 +08:00
binary-husky
ced977c443 修复双dollar公式匹配bug 2023-07-08 22:23:29 +08:00
binary-husky
6c2ffbae52 Update README.md 2023-07-08 19:17:35 +08:00
binary-husky
be2f54fac9 Update README.md 2023-07-08 18:21:20 +08:00
binary-husky
87b5e56378 Update requirements.txt 2023-07-08 18:10:33 +08:00
binary-husky
3a5764ed34 Update requirements.txt 2023-07-08 17:59:27 +08:00
qingxu fu
91aee50ea7 Chuanhu 主题 2023-07-07 20:12:06 +08:00
qingxu fu
e5ccedf491 名称修订 2023-07-07 20:08:26 +08:00
qingxu fu
f620666a58 Merge branch 'improve_ui_master' of https://github.com/binary-husky/chatgpt_academic into improve_ui_master 2023-07-07 19:51:48 +08:00
qingxu fu
594c63e5d6 主题修正 2023-07-07 19:51:09 +08:00
qingxu fu
67d9051890 update error message 2023-07-07 17:41:43 +08:00
binary-husky
be96232127 Merge pull request #933 from binary-husky/master-latex-patch
Latex File Name Bug Patch
2023-07-07 16:57:58 +08:00
binary-husky
3b5bc7a784 Update use_azure.md 2023-07-07 10:55:22 +08:00
binary-husky
5e92f437a1 Update use_azure.md 2023-07-07 10:54:21 +08:00
qingxu fu
eabd9d312f 3.43 2023-07-07 10:47:30 +08:00
qingxu fu
0da6fe78ac 统一azure-gpt-3.5的格式 2023-07-07 10:45:11 +08:00
qingxu fu
be990380a0 Merge branch 'master' of https://github.com/binary-husky/chatgpt_academic into master 2023-07-07 10:42:41 +08:00
qingxu fu
9c0bc48420 修复Azure OpenAI接口的各种bug 2023-07-07 10:42:38 +08:00
binary-husky
5c0d34793e Latex File Name Bug Patch 2023-07-07 00:09:50 +08:00
binary-husky
37fc550652 Update config.py 2023-07-06 10:47:06 +08:00
binary-husky
2c1d6ac212 修复Organization的bug 2023-07-05 21:14:13 +08:00
binary-husky
8c699c1b26 Update README.md 2023-07-05 21:04:28 +08:00
binary-husky
c620fa9011 Update README.md 2023-07-05 20:55:59 +08:00
binary-husky
f16fd60211 Update README.md 2023-07-05 20:34:22 +08:00
binary-husky
9674e59d26 更新说明 2023-07-05 20:22:57 +08:00
binary-husky
643c5e125a 更新提醒 2023-07-05 20:10:18 +08:00
binary-husky
e5099e1daa 极少数情况下,openai的官方KEY需要伴随组织编码 2023-07-05 20:05:20 +08:00
binary-husky
3e621bbec1 Update Dockerfile 2023-07-05 14:37:54 +08:00
qingxu fu
bb1d5a61c0 update translation matrix 2023-07-05 14:32:33 +08:00
binary-husky
fd3d0be2d8 Update config.py 2023-07-05 14:13:04 +08:00
binary-husky
ae623258f3 更详细的配置提示 2023-07-05 14:10:06 +08:00
binary-husky
cda281f08b 把newbing的cookie加回来 2023-07-05 13:48:50 +08:00
binary-husky
9f8e7a6efa 显示更详细的报错 2023-07-05 13:35:11 +08:00
qingxu fu
57643dd2b6 update error msg 2023-07-05 13:01:06 +08:00
qingxu fu
6bc8a78cfe No more cookie for NewBing! 2023-07-05 12:45:10 +08:00
binary-husky
d2700e97fb 更新openai失效提醒 2023-07-05 11:03:11 +08:00
binary-husky
c4dd81dc9a Update Dockerfile 2023-07-04 12:28:52 +08:00
binary-husky
e9b06d7cde Merge pull request #927 from QuantumRoseinAmethystVase/master
Update 批量总结PDF文档.py
2023-07-04 12:24:17 +08:00
qingxu fu
6e6ea69611 Unsplash恢复了 2023-07-04 12:16:01 +08:00
505030475
b082b5eb1b 将阿里云TOKEN移动到config中 2023-07-03 23:20:25 +08:00
505030475
9648d78453 重构异步代码,增强可读性 2023-07-03 22:44:10 +08:00
QuantumRoseinAmethystVase
16c17eb077 Update 批量总结PDF文档.py
Improve the output.
2023-07-03 18:55:16 +08:00
505030475
2dc8718041 语音模组第一个版本 2023-07-03 00:13:10 +08:00
505030475
a330d6636e error 2023-07-02 22:54:05 +08:00
qingxu fu
322c4be145 同步音频输入 2023-07-02 14:42:12 +08:00
qingxu fu
a3596ff60d audio 2023-07-02 01:05:20 +08:00
qingxu fu
e11d8132f8 add green theme 2023-07-01 23:02:44 +08:00
kainstan
59877dd728 Local variable 'result' might be referenced before assignment, add else result 2023-07-01 22:27:11 +08:00
w_xiaolizu
5f7ffef238 增加基础功能判空 2023-07-01 22:04:42 +08:00
qingxu fu
41c10f5688 report image generation error in UI 2023-07-01 02:28:32 +08:00
qingxu fu
d7ac99f603 更正错误提示 2023-07-01 01:46:43 +08:00
qingxu fu
1616daae6a Merge branch 'master' of https://github.com/binary-husky/chatgpt_academic into master 2023-07-01 00:17:30 +08:00
qingxu fu
a1092d8f92 提供自动清空输入框的选项 2023-07-01 00:17:26 +08:00
binary-husky
34ca9f138f Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2023-06-30 14:56:28 +08:00
binary-husky
df3f1aa3ca 更正ChatGLM2的默认Token数量 2023-06-30 14:56:22 +08:00
qingxu fu
bf805cf477 Merge branch 'master' of https://github.com/binary-husky/chatgpt_academic into master 2023-06-30 13:09:51 +08:00
qingxu fu
ecb08e69be remove find picture core functionality 2023-06-30 13:08:54 +08:00
binary-husky
28c1e3f11b Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2023-06-30 12:06:33 +08:00
binary-husky
403667aec1 upgrade chatglm to chatglm2 2023-06-30 12:06:28 +08:00
qingxu fu
22f377e2fb fix multi user cwd shift 2023-06-30 11:05:47 +08:00
binary-husky
37172906ef 修复文件导出的bug 2023-06-29 14:55:55 +08:00
binary-husky
3b78e0538b 修复插件demo的图像显示的问题 2023-06-29 14:52:58 +08:00
binary-husky
d8f9ac71d0 Merge pull request #907 from Xminry/master
feat:联网搜索功能,cn.bing.com版,国内可用
2023-06-29 12:44:32 +08:00
qingxu fu
aced272d3c 微调插件提示 2023-06-29 12:43:50 +08:00
qingxu fu
aff77a086d Merge branch 'master' of https://github.com/Xminry/gpt_academic into Xminry-master 2023-06-29 12:38:43 +08:00
qingxu fu
49253c4dc6 [arxiv trans] add html comparison to zip file 2023-06-29 12:29:49 +08:00
qingxu fu
1a00093015 修复提示 2023-06-29 12:15:52 +08:00
qingxu fu
64f76e7401 3.42 2023-06-29 11:32:19 +08:00
qingxu fu
eb4c07997e 修复Latex矫错和本地Latex论文翻译的问题 2023-06-29 11:30:42 +08:00
Xminry
99cf7205c3 feat:联网搜索功能,cn.bing.com版,国内可用 2023-06-28 10:30:08 +08:00
binary-husky
d684b4cdb3 Merge pull request #905 from Xminry/master
Update 理解PDF文档内容.py
2023-06-27 23:37:25 +08:00
binary-husky
601a95c948 Merge pull request #881 from OverKit/master
update latex_utils.py
2023-06-27 19:20:17 +08:00
qingxu fu
e18bef2e9c add item breaker 2023-06-27 19:16:05 +08:00
qingxu fu
f654c1af31 merge regex expressions 2023-06-27 18:59:56 +08:00
qingxu fu
e90048a671 Merge branch 'master' of https://github.com/OverKit/gpt_academic into OverKit-master 2023-06-27 16:14:12 +08:00
binary-husky
ea624b1510 Merge pull request #889 from dackdawn/master
添加0613模型的声明
2023-06-27 15:03:15 +08:00
qingxu fu
057e3dda3c Merge branch 'master' of https://github.com/dackdawn/gpt_academic into dackdawn-master 2023-06-27 15:02:22 +08:00
Xminry
4290821a50 Update 理解PDF文档内容.py 2023-06-27 01:57:31 +08:00
binary-husky
280e14d7b7 更新Latex模块的docker-compose 2023-06-26 09:59:14 +08:00
505030475
9f0cf9fb2b arxiv PDF 引用 2023-06-25 23:30:31 +08:00
505030475
b8560b7510 修正误判latex模板文件的bug 2023-06-25 22:46:16 +08:00
505030475
d841d13b04 add arxiv translation test samples 2023-06-25 22:12:44 +08:00
binary-husky
efda9e5193 Merge pull request #897 from Ranhuiryan/master
添加azure-gpt35选项
2023-06-24 17:59:51 +10:00
Ranhuiryan
33d2e75aac add azure-gpt35 to model list 2023-06-21 16:19:49 +08:00
Ranhuiryan
74941170aa update azure use instruction 2023-06-21 16:19:26 +08:00
505030475
cd38949903 当遇到错误时,回滚到原文 2023-06-21 11:53:57 +10:00
505030475
d87f1eb171 更新接入azure的说明 2023-06-21 11:38:59 +10:00
binary-husky
cd1e4e1ba7 Merge pull request #797 from XiaojianTang/master
增加azure openai api的支持
2023-06-21 11:23:41 +10:00
505030475
cf5f348d70 update test samples 2023-06-21 11:20:31 +10:00
binary-husky
0ee25f475e Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2023-06-20 23:07:51 +08:00
binary-husky
1fede6df7f temp 2023-06-20 23:05:17 +08:00
binary-husky
22a65cd163 Create build-with-latex.yml 2023-06-21 00:55:24 +10:00
binary-husky
538b041ea3 Merge pull request #890 from Mcskiller/master
Update README.md
2023-06-21 00:53:26 +10:00
505030475
d7b056576d add latex docker-compose 2023-06-21 00:52:58 +10:00
505030475
cb0bb6ab4a fix minor bugs 2023-06-21 00:41:33 +10:00
505030475
bf955aaf12 fix bugs 2023-06-20 23:12:30 +10:00
505030475
61eb0da861 fix encoding bug 2023-06-20 22:08:09 +10:00
Lebenito(生糸)
5da633d94d Update README.md
Fix the error URL for the git clone.
2023-06-20 19:10:11 +08:00
dackdawn
f3e4e26e2f 添加0613模型的声明
openai对gpt-3.5-turbo的RPM限制是3,而gpt-3.5-turbo-0613的RPM是60,虽然两个模型的内容是一致的,但是选定特定模型可以获得更高的RPM和TPM
2023-06-19 21:40:26 +08:00
505030475
af7734dd35 avoid file fusion 2023-06-19 16:57:11 +10:00
505030475
d5bab093f9 rename function names 2023-06-19 15:17:33 +10:00
505030475
f94b167dc2 Merge branch 'master' into overkit-master 2023-06-19 14:53:51 +10:00
505030475
951d5ec758 Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2023-06-19 14:52:25 +10:00
505030475
016d8ee156 Merge remote-tracking branch 'origin/master' into OverKit-master 2023-06-19 14:51:59 +10:00
505030475
dca9ec4bae Merge branch 'master' of https://github.com/OverKit/gpt_academic into OverKit-master 2023-06-19 14:49:50 +10:00
binary-husky
a06e43c96b Update README.md 2023-06-18 16:15:37 +08:00
binary-husky
29c6bfb6cb Update README.md 2023-06-18 16:12:06 +08:00
binary-husky
8d7ee975a0 Update README.md 2023-06-18 16:10:45 +08:00
binary-husky
4bafbb3562 Update Latex输出PDF结果.py 2023-06-18 15:54:23 +08:00
OverKit
7fdf0a8e51 调整区分内容的代码 2023-06-18 15:51:29 +08:00
binary-husky
2bb13b4677 Update README.md 2023-06-18 15:44:42 +08:00
OverKit
9a5a509dd9 修复关于abstract的搜索 2023-06-17 19:27:21 +08:00
binary-husky
cbcb98ef6a Merge pull request #872 from Skyzayre/master
Update README.md
2023-06-16 17:54:39 +08:00
qingxu fu
bb864c6313 增加一些提示文字 2023-06-16 17:33:19 +08:00
qingxu fu
6d849eeb12 修复Langchain插件的bug 2023-06-16 17:33:03 +08:00
Skyzayre
ef752838b0 Update README.md 2023-06-15 02:07:43 +08:00
binary-husky
73d4a1ff4b Update README.md 2023-06-14 10:15:47 +08:00
qingxu fu
8c62f21aa6 3.41增加gpt-3.5-16k的支持 2023-06-14 09:57:09 +08:00
qingxu fu
c40ebfc21f 将gpt-3.5-16k作为加入支持列表 2023-06-14 09:50:15 +08:00
binary-husky
c365ea9f57 Update README.md 2023-06-13 16:13:19 +08:00
binary-husky
12d66777cc Merge pull request #864 from OverKit/master
check letter % after removing spaces or tabs in the left
2023-06-12 15:21:35 +08:00
OverKit
9ac3d0d65d check letter % after removing spaces or tabs in the left 2023-06-12 10:09:52 +08:00
binary-husky
9fd212652e 专业词汇声明 2023-06-12 09:45:59 +08:00
binary-husky
790a1cf12a 添加一些提示 2023-06-11 20:12:25 +08:00
binary-husky
3ecf2977a8 修复caption翻译 2023-06-11 18:23:54 +08:00
binary-husky
aeddf6b461 Update Latex输出PDF结果.py 2023-06-11 10:20:49 +08:00
505030475
ce0d8b9dab 虚空终端插件雏形 2023-06-11 01:36:23 +08:00
binary-husky
3c00e7a143 file link in chatbot 2023-06-10 21:45:38 +08:00
binary-husky
ef1bfdd60f update pip install notice 2023-06-08 21:29:10 +08:00
qingxu fu
e48d92e82e update translation 2023-06-08 18:34:06 +08:00
binary-husky
110510997f Update README.md 2023-06-08 12:48:52 +08:00
binary-husky
b52695845e Update README.md 2023-06-08 12:44:05 +08:00
binary-husky
f30c9c6d3b Update README.md 2023-06-08 12:43:13 +08:00
binary-husky
ff5403eac6 Update README.md 2023-06-08 12:42:24 +08:00
binary-husky
f9226d92be Update version 2023-06-08 12:24:14 +08:00
binary-husky
a0ea5d0e9e Update README.md 2023-06-08 12:22:03 +08:00
binary-husky
ce6f11d200 Update README.md 2023-06-08 12:20:49 +08:00
binary-husky
10b3001dba Update README.md 2023-06-08 12:19:11 +08:00
binary-husky
e2de1d76ea Update README.md 2023-06-08 12:18:31 +08:00
binary-husky
77cc141a82 Update README.md 2023-06-08 12:14:02 +08:00
binary-husky
526b4d8ecd Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2023-06-07 11:09:20 +08:00
binary-husky
149db621ec langchain check depends 2023-06-07 11:09:12 +08:00
binary-husky
2e1bb7311c Merge pull request #848 from MengDanzz/master
将Dockerfile COPY分成两段,缓存依赖库,重新构建不需要重新安装
2023-06-07 10:44:09 +08:00
binary-husky
dae65fd2c2 在copy ..后在运行一次pip install检查依赖变化 2023-06-07 10:43:45 +08:00
MengDanzz
9aafb2ee47 非pypi包加入COPY 2023-06-07 09:18:57 +08:00
MengDanzz
6bc91bd02e Merge branch 'binary-husky:master' into master 2023-06-07 09:15:44 +08:00
qingxu fu
8ef7344101 fix subprocess bug in Windows 2023-06-06 18:57:52 +08:00
binary-husky
40da1b0afe 将Latex分解程序放到子进程执行 2023-06-06 18:44:00 +08:00
MengDanzz
c65def90f3 将Dockerfile COPY分成两段,缓存依赖库,重新构建不需要重新安装 2023-06-06 14:36:30 +08:00
binary-husky
ddeaf76422 check latex in PATH 2023-06-06 00:23:00 +08:00
qingxu fu
f23b66dec2 update Dockerfile with Latex 2023-06-05 23:49:54 +08:00
qingxu fu
a26b294817 Write Some Docstring 2023-06-05 23:44:59 +08:00
qingxu fu
66018840da declare resp 2023-06-05 23:24:41 +08:00
qingxu fu
cea2144f34 fix test samples 2023-06-05 23:11:21 +08:00
qingxu fu
7f5be93c1d 修正一些正则匹配bug 2023-06-05 22:57:39 +08:00
binary-husky
85b838b302 add Linux support 2023-06-04 23:06:35 +08:00
qingxu fu
27f97ba92a remove previous results 2023-06-04 16:55:36 +08:00
qingxu fu
14269eba98 建立本地arxiv缓存区 2023-06-04 16:08:01 +08:00
qingxu fu
d5c9bc9f0a 提高iffalse搜索优先级 2023-06-04 14:15:59 +08:00
qingxu fu
b0fed3edfc consider iffalse state 2023-06-04 14:06:02 +08:00
qingxu fu
7296d054a2 patch latex segmentation 2023-06-04 13:56:15 +08:00
qingxu fu
d57c7d352d improve quality 2023-06-03 23:54:30 +08:00
qingxu fu
3fd2927ea3 改善 2023-06-03 23:33:45 +08:00
qingxu fu
b745074160 avoid most compile failure 2023-06-03 23:33:32 +08:00
qingxu fu
70ee810133 improve success rate 2023-06-03 19:39:19 +08:00
qingxu fu
68fea9e79b fix test 2023-06-03 18:09:39 +08:00
qingxu fu
f82bf91aa8 test example 2023-06-03 18:06:39 +08:00
qingxu fu
dde9edcc0c fix a fatal mistake 2023-06-03 17:49:22 +08:00
qingxu fu
66c78e459e 修正提示 2023-06-03 17:18:38 +08:00
qingxu fu
de54102303 修改提醒 2023-06-03 16:43:26 +08:00
qingxu fu
7c7d2d8a84 Latex的minipage补丁 2023-06-03 16:16:32 +08:00
qingxu fu
834f989ed4 考虑有人用input不加.tex的情况 2023-06-03 15:42:22 +08:00
qingxu fu
b658ee6e04 修复arxiv翻译的一些问题 2023-06-03 15:36:55 +08:00
qingxu fu
1a60280ea0 添加警告 2023-06-03 14:40:37 +08:00
qingxu fu
991cb7d272 warning 2023-06-03 14:39:40 +08:00
qingxu fu
463991cfb2 fix bug 2023-06-03 14:24:06 +08:00
qingxu fu
06f10b5fdc fix zh cite bug 2023-06-03 14:17:58 +08:00
qingxu fu
d275d012c6 Merge branch 'langchain' into master 2023-06-03 13:53:39 +08:00
qingxu fu
c5d1ea3e21 update langchain version 2023-06-03 13:53:34 +08:00
qingxu fu
0022b92404 update prompt 2023-06-03 13:50:39 +08:00
qingxu fu
ef61221241 latex auto translation milestone 2023-06-03 13:46:40 +08:00
qingxu fu
5a1831db98 成功! 2023-06-03 00:34:23 +08:00
qingxu fu
a643f8b0db debug translation 2023-06-02 23:06:01 +08:00
qingxu fu
601712fd0a latex toolchain 2023-06-02 21:44:11 +08:00
505030475
e769f831c7 latex 2023-06-02 14:07:04 +08:00
binary-husky
dcd952671f Update main.py 2023-06-01 15:56:52 +08:00
binary-husky
06564df038 Merge branch 'langchain' 2023-06-01 09:39:34 +08:00
binary-husky
2f037f30d5 暂时移除插件锁定 2023-06-01 09:39:00 +08:00
505030475
efedab186d Merge branch 'master' into langchain 2023-06-01 00:10:22 +08:00
binary-husky
f49cae5116 Update Langchain知识库.py 2023-06-01 00:09:07 +08:00
binary-husky
2b620ccf2e 更新提示 2023-06-01 00:07:19 +08:00
binary-husky
a1b7a4da56 更新测试案例 2023-06-01 00:03:27 +08:00
binary-husky
61b0e49fed fix some bugs in linux 2023-05-31 23:49:25 +08:00
binary-husky
f60dc371db 12 2023-05-31 10:42:44 +08:00
binary-husky
0a3433b8ac Update README.md 2023-05-31 10:37:08 +08:00
binary-husky
31bce54abb Update README.md 2023-05-31 10:34:21 +08:00
binary-husky
5db1530717 Merge branch 'langchain' of github.com:binary-husky/chatgpt_academic into langchain 2023-05-30 20:08:47 +08:00
binary-husky
c32929fd11 Merge branch 'master' into langchain 2023-05-30 20:08:15 +08:00
505030475
3e4c2b056c knowledge base 2023-05-30 19:55:38 +08:00
505030475
e79e9d7d23 Merge branch 'master' into langchain 2023-05-30 18:31:39 +08:00
binary-husky
d175b93072 Update README.md.Italian.md 2023-05-30 17:27:41 +08:00
binary-husky
ed254687d2 Update README.md.Italian.md 2023-05-30 17:26:12 +08:00
binary-husky
c0392f7074 Update README.md.Korean.md 2023-05-30 17:25:32 +08:00
binary-husky
f437712af7 Update README.md.Portuguese.md 2023-05-30 17:22:46 +08:00
505030475
6d1ea643e9 langchain 2023-05-30 12:54:42 +08:00
binary-husky
9e84cfcd46 Update README.md 2023-05-29 19:48:34 +08:00
binary-husky
897695d29f 修复二级路径的文件屏蔽 2023-05-28 20:25:35 +08:00
binary-husky
1dcc2873d2 修复Gradio配置泄露的问题 2023-05-28 20:23:47 +08:00
binary-husky
42cf738a31 修复一些情况下复制键失效的问题 2023-05-28 18:12:48 +08:00
binary-husky
e4646789af Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2023-05-28 16:07:29 +08:00
binary-husky
e6c3aabd45 docker-compose check 2023-05-28 16:07:24 +08:00
binary-husky
6789d1fab4 Update README.md 2023-05-28 11:21:50 +08:00
binary-husky
7a733f00a2 Update README.md 2023-05-28 00:19:23 +08:00
binary-husky
dd55888f0e Update README.md 2023-05-28 00:16:45 +08:00
binary-husky
0327df22eb Update README.md 2023-05-28 00:14:54 +08:00
binary-husky
e544f5e9d0 Update README.md 2023-05-27 23:45:15 +08:00
binary-husky
0fad4f44a4 fix dockerfile 2023-05-27 23:36:42 +08:00
binary-husky
1240dd6f26 local gradio 2023-05-27 23:29:22 +08:00
505030475
d6be947177 修复gradio的依赖安装问题 2023-05-27 23:10:44 +08:00
505030475
3cfbdce9f2 remove limitation for now 2023-05-27 22:25:50 +08:00
505030475
1ee471ff57 fix reminder 2023-05-27 22:20:46 +08:00
binary-husky
25ccecf8e3 Update README.md 2023-05-27 21:56:43 +08:00
binary-husky
9e991bfa3e Update requirements.txt 2023-05-27 21:56:16 +08:00
binary-husky
221efd0193 Update README.md 2023-05-27 21:11:25 +08:00
binary-husky
976b9bf65f Update README.md 2023-05-27 21:04:52 +08:00
binary-husky
ae5783e383 修复gradio复制按钮BUG 2023-05-27 20:20:45 +08:00
binary-husky
30224af042 Merge pull request #798 from Bit0r/master
🐛 匹配latex注释的正则表达式
2023-05-27 14:03:07 +08:00
Bit0r
8ff7c15cd8 🐛 匹配latex注释的正则表达式 2023-05-27 11:19:48 +08:00
XiaojianTang
f3205994ea 增加azure openai api的支持 2023-05-26 23:22:12 +08:00
505030475
ec8cc48a4d Add ProxyNetworkActivate 2023-05-25 23:48:18 +08:00
binary-husky
5d75c578b9 fix dependency 2023-05-25 15:28:27 +08:00
binary-husky
cd411c2eea newbing-free deps 2023-05-25 15:12:54 +08:00
binary-husky
bb2f276ba5 remove duplicate 2023-05-25 15:00:07 +08:00
qingxu fu
348e50c0c9 up 2023-05-25 14:56:54 +08:00
qingxu fu
9d7fc31706 up 2023-05-25 14:56:16 +08:00
qingxu fu
3108b4a426 fix format 2023-05-25 14:23:35 +08:00
qingxu fu
3da12b5bf7 readme translation 2023-05-25 14:20:20 +08:00
qingxu fu
12710ff1fa Merge branch 'master' of https://github.com/binary-husky/chatgpt_academic into master 2023-05-25 13:49:56 +08:00
qingxu fu
e7df3a551d up 2023-05-25 13:49:51 +08:00
qingxu fu
7947c968ad 现在指定markdown的翻译语言 2023-05-25 13:46:50 +08:00
binary-husky
3dd15dee61 Update multi_language.py 2023-05-25 13:13:23 +08:00
binary-husky
b4f0be329b Update multi_language.py 2023-05-25 13:11:31 +08:00
binary-husky
e3f903d132 Update multi_language.py 2023-05-25 13:07:37 +08:00
binary-husky
e18ab0afc0 Update multi_language.py 2023-05-25 13:06:34 +08:00
binary-husky
2b61556acc Update README.md 2023-05-25 13:01:22 +08:00
qingxu fu
51c075ec3c update English translation 2023-05-25 12:50:33 +08:00
qingxu fu
e22f1917b2 update note 2023-05-25 12:48:20 +08:00
qingxu fu
ed53442942 up 2023-05-25 12:39:41 +08:00
qingxu fu
fad502a938 up 2023-05-25 12:32:39 +08:00
qingxu fu
4c0c1034db up 2023-05-25 12:32:10 +08:00
qingxu fu
1c029e1276 up 2023-05-25 12:31:31 +08:00
qingxu fu
bcfc0f0f74 up 2023-05-25 12:20:22 +08:00
qingxu fu
bc8dc7f102 up 2023-05-25 12:15:23 +08:00
qingxu fu
a099f98f0e fix bug 2023-05-25 12:14:03 +08:00
qingxu fu
2887720999 Merge branch 'master' of https://github.com/binary-husky/chatgpt_academic into master 2023-05-25 11:36:38 +08:00
qingxu fu
cc0e0a90a6 down 2023-05-25 11:36:35 +08:00
binary-husky
9256bcf68e Update feature_request.yml 2023-05-25 10:17:37 +08:00
binary-husky
e6cc28b0f6 Update and rename feature_request.md to feature_request.yml 2023-05-25 10:16:16 +08:00
binary-husky
e8bed9ce85 Update config.py 2023-05-25 10:10:33 +08:00
qingxu fu
582010e6a1 Merge branch 'master' of https://github.com/binary-husky/chatgpt_academic into master 2023-05-25 01:38:09 +08:00
qingxu fu
dd05f29d66 update self analysis 2023-05-25 01:38:06 +08:00
binary-husky
746a607652 Update README.md 2023-05-25 01:33:30 +08:00
binary-husky
b87592f43d Update README.md 2023-05-25 01:31:32 +08:00
binary-husky
b9ec396d08 Update README.md 2023-05-25 01:30:49 +08:00
qingxu fu
293ad9052d 改善源代码解析功能,能处理更多文件 2023-05-25 01:15:24 +08:00
qingxu fu
e6f292c14b 修复最后一个完成的线程不更新状态的问题 2023-05-25 01:04:26 +08:00
binary-husky
0bda5c54ed Update README.md 2023-05-25 00:27:19 +08:00
qingxu fu
bc613c74af Merge branch 'master' of https://github.com/binary-husky/chatgpt_academic into master 2023-05-25 00:24:32 +08:00
qingxu fu
35c3c0f2c6 新增latex文章校对纠错功能 2023-05-25 00:24:29 +08:00
binary-husky
cd3f2860f8 Update README.md 2023-05-25 00:22:29 +08:00
binary-husky
2fa9aa233c Update README.md 2023-05-24 21:13:23 +08:00
binary-husky
1275f77986 Update README.md 2023-05-24 21:11:41 +08:00
binary-husky
f0f88f5f48 Update README.md 2023-05-24 21:11:10 +08:00
qingxu fu
42eef1bea7 add free newbing without cookie using edge-gpt 2023-05-24 10:42:11 +08:00
binary-husky
728eba04ec Update README.md 2023-05-23 17:13:53 +08:00
binary-husky
694f12c97d Update bug_report.yml 2023-05-23 17:06:23 +08:00
binary-husky
a075e9631d Update bug_report.yml 2023-05-23 12:36:02 +08:00
binary-husky
ee84c144dd Update version 3.36 2023-05-23 00:08:04 +08:00
505030475
fffb78e7af Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2023-05-23 00:05:27 +08:00
505030475
db16e85d8c 修复pdf翻译的问题 2023-05-23 00:05:00 +08:00
binary-husky
72b412267d Merge pull request #776 from ChristLZS/master
support rust program
2023-05-22 22:34:37 +08:00
li zhisheng
e2137b896e [main] support rust program 2023-05-22 19:27:38 +08:00
505030475
6d557b3c34 fix history commit problem 2023-05-20 13:54:19 +08:00
binary-husky
76e0452619 添加把项目翻译为任意语言的功能(测试) 2023-05-20 13:42:14 +08:00
binary-husky
e62c0b30ae Merge pull request #767 from binary-husky/multi_language
Add Multi Language Support
2023-05-20 13:40:55 +08:00
505030475
d29f524cec Merge remote-tracking branch 'origin/master' into multi_language 2023-05-20 13:36:23 +08:00
505030475
b7e08229fa add user explaination 2023-05-20 13:35:31 +08:00
505030475
e38e6e22f5 multi-lan 2023-05-20 13:32:06 +08:00
505030475
f05862c854 Json is good 2023-05-20 13:01:58 +08:00
505030475
fc762cbf7f stage one 2023-05-20 12:23:46 +08:00
505030475
c376e46f4d translate not fin 2023-05-19 23:52:20 +08:00
qingxu fu
8d528190a9 rt 2023-05-19 13:23:44 +08:00
binary-husky
d2fa4c80eb Update config.py 2023-05-19 13:00:38 +08:00
binary-husky
212ca0c0b9 3.35 2023-05-19 12:51:43 +08:00
binary-husky
c32c585384 音频转文字+总结 2023-05-19 12:25:58 +08:00
binary-husky
62a596ef30 Merge pull request #742 from FutureUnreal/new_branch
增加批量总结音视频的功能
2023-05-19 12:25:13 +08:00
binary-husky
7d8338ce70 允许音频转文字时的高级参数指令 2023-05-19 12:24:04 +08:00
binary-husky
c46a8d27e6 修正参数默认值bug 2023-05-19 12:23:01 +08:00
binary-husky
d8540d42a6 move dep 2023-05-19 11:22:25 +08:00
binary-husky
f30bee2409 Merge branch 'new_branch' of github.com:FutureUnreal/gpt_academic into FutureUnreal-new_branch 2023-05-19 11:20:18 +08:00
binary-husky
c7841fd998 Merge pull request #727 from CSUMaVeRick/master
分享一个参考文献条目转换为BibTex的自定义函数 Share a function that can transform bibliography items into BibTex style
2023-05-19 11:17:47 +08:00
binary-husky
254fac0045 move moss folder to gitignore 2023-05-19 11:16:53 +08:00
binary-husky
5159a1e7a1 core function 隐藏功能 2023-05-19 11:14:44 +08:00
binary-husky
e2d75f1b62 remove yml 2023-05-19 11:09:30 +08:00
binary-husky
4f77c27d6d Merge branch 'master' of github.com:CSUMaVeRick/gpt_academic into CSUMaVeRick-master 2023-05-19 11:07:59 +08:00
binary-husky
e7080e671d Merge pull request #746 from Rid7/claude
接入Claude in Slack服务,暂时不支持历史消息设置(单个slack实例,多人使用请谨慎隐私风险)
2023-05-19 11:02:58 +08:00
qingxu fu
b0c2e2d92b 修订提示 2023-05-19 10:58:22 +08:00
qingxu fu
77a2d62ef6 捕获缺少依赖时的异常 2023-05-19 10:55:50 +08:00
qingxu fu
c43e22bc41 change claude model name to stack-claude 2023-05-19 10:46:12 +08:00
qingxu fu
be6b42324d Merge branch 'claude' of github.com:Rid7/gpt_academic into Rid7-claude 2023-05-19 09:39:47 +08:00
505030475
3951159d55 ml 2023-05-18 14:39:57 +08:00
505030475
6c448b9a60 translate efficient 2023-05-16 01:05:25 +08:00
505030475
43e64782dc 修正非官方的OpenAI反代错误显示问题 2023-05-16 00:35:47 +08:00
binary-husky
5f79fed566 Merge pull request #748 from duhaode520/master
🐞 fix(谷歌学术搜索): 包装search.results()为空可能造成的报错
2023-05-15 17:27:41 +08:00
binary-husky
f2a55dc769 Update bug_report.yml 2023-05-15 17:22:52 +08:00
duhaode520
3f31fb9990 🐞 fix(谷歌学术搜索): 包装search.results()为空可能造成的报错
https://github.com/binary-husky/gpt_academic/issues/423
2023-05-15 08:11:13 +00:00
Rid7
d795dc1a81 取消重置时调用claude_model的reset方法 2023-05-15 15:47:05 +08:00
Rid7
f90ec93dfc Merge remote-tracking branch 'origin/claude' into claude 2023-05-15 15:18:03 +08:00
Rid7
6d267947bb 实现Claude聊天功能配置项 2023-05-15 15:12:50 +08:00
Rid7
595e5cceae 实现Claude聊天功能 2023-05-15 15:07:53 +08:00
Rid7
2291a67cf8 实现Claude聊天功能 2023-05-15 14:27:31 +08:00
binary-husky
c0e57e0e39 fix bool env read bug 2023-05-14 15:18:33 +08:00
‘dalvqw’
dcd5f7996e 增加批量总结音视频的功能 2023-05-14 12:51:33 +08:00
505030475
303e4dd617 Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2023-05-13 14:20:46 +08:00
505030475
d52c0c4783 修改输出格式 2023-05-13 14:20:34 +08:00
binary-husky
e4de1549a3 Update README.md 2023-05-13 14:07:42 +08:00
505030475
986653b43e resolution 2023-05-13 14:00:07 +08:00
505030475
08e184ea55 添加图片生成接口插件 2023-05-13 00:28:29 +08:00
505030475
fdb9650cca word file format reminder 2023-05-12 23:05:16 +08:00
binary-husky
dadbb71147 Update bridge_chatgpt.py 2023-05-11 18:42:51 +08:00
binary-husky
18a59598ea Update README.md 2023-05-11 18:11:19 +08:00
CSUMaVeRick
57297605e2 Update core_functional.py 2023-05-11 13:42:51 +08:00
binary-husky
1134ec2df5 Update README.md 2023-05-08 20:33:47 +08:00
binary-husky
f54872007f Update README.md 2023-05-08 20:33:32 +08:00
binary-husky
24a832608c Update README.md 2023-05-08 20:32:18 +08:00
binary-husky
2fa52f71e7 Update README.md 2023-05-08 20:31:35 +08:00
binary-husky
00e7fbd7fa Update README.md 2023-05-08 20:27:18 +08:00
binary-husky
397dc2d0dc Update README.md 2023-05-08 20:22:43 +08:00
binary-husky
98269e8708 Update README.md 2023-05-08 20:21:28 +08:00
binary-husky
1bb45d4998 Update docker-compose.yml 2023-05-08 20:16:43 +08:00
binary-husky
8f9c5c5039 Update README.md 2023-05-08 20:13:32 +08:00
binary-husky
88ac4cf0a7 Update README.md 2023-05-08 20:12:38 +08:00
fuqingxu
624d203bbc update docker compose 2023-05-08 20:09:54 +08:00
fuqingxu
84fc8647f7 修正moss和chatglm的环境依赖 2023-05-08 20:06:41 +08:00
fuqingxu
a554b7f0e4 Merge branch 'master' of https://github.com/binary-husky/gpt_academic 2023-05-08 19:23:21 +08:00
fuqingxu
777850200d update the error handling of moss and chatglm 2023-05-08 19:21:17 +08:00
binary-husky
3f251e4571 Update bug_report.yml 2023-05-08 18:45:23 +08:00
binary-husky
2dd65af9f0 Update bug_report.yml 2023-05-08 18:42:52 +08:00
binary-husky
f8209e51f5 Update bug_report.yml 2023-05-08 18:40:35 +08:00
binary-husky
111a65e9e8 Update bug_report.yml 2023-05-08 18:34:55 +08:00
binary-husky
c0ed2131f0 Update and rename bug_report.md to bug_report.yml 2023-05-08 18:33:41 +08:00
binary-husky
10882b677d Update README.md 2023-05-07 22:54:29 +08:00
binary-husky
aed1b20ada Update GithubAction+ChatGLM+Moss 2023-05-07 17:13:51 +08:00
505030475
68bdec12c0 try jittor build 2023-05-07 16:47:20 +08:00
505030475
1404811845 Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2023-05-07 16:40:49 +08:00
505030475
e92ae1eb2c Try Github Actions 2023-05-07 16:40:41 +08:00
binary-husky
0d0890cb92 Update and rename docker-image.yml to build-without-local-llms.yml 2023-05-07 16:40:13 +08:00
binary-husky
a76f275691 Create build-with-chatglm.yml 2023-05-07 16:38:49 +08:00
binary-husky
cfcd45b8b9 Update docker-image.yml 2023-05-07 16:22:10 +08:00
binary-husky
9c72a6f6e9 Update docker-image.yml 2023-05-07 16:11:36 +08:00
binary-husky
da4e483d80 Update docker-image.yml 2023-05-07 16:08:03 +08:00
binary-husky
41f801129a Update docker-image.yml 2023-05-07 15:55:42 +08:00
binary-husky
caf7bf2b9a Create docker-image.yml 2023-05-07 15:55:14 +08:00
505030475
986e6461ed reset github action 2023-05-07 15:54:22 +08:00
505030475
29d027087b Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2023-05-07 15:50:45 +08:00
505030475
7a687347e1 修改注释 2023-05-07 15:50:34 +08:00
binary-husky
5b9a1e9531 Update docker-image.yml 2023-05-07 15:46:49 +08:00
binary-husky
b1154b368c Update docker-image.yml 2023-05-07 15:44:44 +08:00
505030475
4f0cd42117 Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2023-05-07 15:37:56 +08:00
505030475
f5ccc8bdc6 GithubAction Test 2023-05-07 15:37:47 +08:00
binary-husky
62d5775b79 Create docker-image.yml
experimental docker build action
2023-05-07 15:26:49 +08:00
binary-husky
00eb17b2e7 Update README.md 2023-05-07 15:08:53 +08:00
binary-husky
3c5df9c02e Update README.md 2023-05-07 14:47:46 +08:00
505030475
1626fbd9d6 version 3.34 2023-05-07 14:19:39 +08:00
binary-husky
36ff2092d7 适配新版gradio的暗色主题 2023-05-07 14:13:57 +08:00
binary-husky
3cf9c88891 暗色模式适配新版gradio 2023-05-07 14:12:37 +08:00
binary-husky
78045001f2 Update README.md 2023-05-07 14:11:54 +08:00
binary-husky
5c57816230 Update README.md 2023-05-07 01:46:07 +08:00
binary-husky
fa395aac6e Update README.md 2023-05-07 01:42:43 +08:00
binary-husky
8dded0c435 Update README.md 2023-05-07 01:32:47 +08:00
binary-husky
933a865b10 支持MOSS的说明 2023-05-07 01:27:50 +08:00
binary-husky
6b8b14b11e Merge branch 'master' of github.com:binary-husky/chatgpt_academic 2023-05-07 01:05:19 +08:00
binary-husky
5102ec8263 添加对复旦大学MOSS的支持 2023-05-07 01:04:59 +08:00
binary-husky
c1e4db243d Update README.md 2023-05-07 00:03:40 +08:00
binary-husky
4b9078a9dc merge jittor branch 2023-05-06 23:39:57 +08:00
binary-husky
62d14cfa3f Merge pull request #695 from Undertone0809/master
fix: resolve keyerror 'serialized_input' for mac/windows platform
2023-05-06 22:29:39 +08:00
binary-husky
bd6ec158d4 Merge branch 'master' into master 2023-05-06 22:29:28 +08:00
binary-husky
d2f04e2dd2 Update requirements.txt 2023-05-06 22:28:37 +08:00
binary-husky
b47054c479 Update requirements.txt 2023-05-06 22:18:23 +08:00
Zeeland
15c40bdaff fix: resolve keyerror 'serialized_input' for windows platform 2023-05-06 17:05:24 +08:00
binary-husky
44a71fdbf1 Update README.md 2023-05-06 10:32:36 +08:00
binary-husky
996a0486af Update README.md 2023-05-06 10:30:27 +08:00
binary-husky
a15eb56ee8 Update README.md 2023-05-05 18:22:52 +08:00
binary-husky
daef87da41 Update README.md 2023-05-05 18:19:42 +08:00
binary-husky
0b4d68fbee Update README.md 2023-05-05 18:17:52 +08:00
binary-husky
9f3d67e7bd Update docker-compose.yml 2023-05-05 17:59:14 +08:00
binary-husky
47866ebe0e Update docker-compose.yml 2023-05-05 17:58:41 +08:00
CSUMaVeRick
30de8f1358 Add or update the Azure App Service build and deployment workflow config 2023-05-04 00:52:12 +08:00
共有 171 个文件被更改,包括 24230 次插入15069 次删除

查看文件

@@ -1,25 +0,0 @@
---
name: Bug report
about: Create a report to help us improve
title: ''
labels: ''
assignees: ''
---
- **(1) Describe the bug 简述**
- **(2) Screen Shot 截图**
- **(3) Terminal Traceback 终端traceback如有**
- **(4) Material to Help Reproduce Bugs 帮助我们复现的测试材料样本(如有)**
Before submitting an issue 提交issue之前
- Please try to upgrade your code. 如果您的代码不是最新的,建议您先尝试更新代码
- Please check project wiki for common problem solutions.项目[wiki](https://github.com/binary-husky/chatgpt_academic/wiki)有一些常见问题的解决方法

77
.github/ISSUE_TEMPLATE/bug_report.yml vendored 普通文件
查看文件

@@ -0,0 +1,77 @@
name: Report Bug | 报告BUG
description: "Report bug"
title: "[Bug]: "
labels: []
body:
- type: dropdown
id: download
attributes:
label: Installation Method | 安装方法与平台
options:
- Please choose | 请选择
- Pip Install (I ignored requirements.txt)
- Pip Install (I used latest requirements.txt)
- OneKeyInstall (一键安装脚本-windows)
- OneKeyInstall (一键安装脚本-mac)
- Anaconda (I ignored requirements.txt)
- Anaconda (I used latest requirements.txt)
- DockerWindows/Mac
- DockerLinux
- Docker-ComposeWindows/Mac
- Docker-ComposeLinux
- Huggingface
- Others (Please Describe)
validations:
required: true
- type: dropdown
id: version
attributes:
label: Version | 版本
options:
- Please choose | 请选择
- Latest | 最新版
- Others | 非最新版
validations:
required: true
- type: dropdown
id: os
attributes:
label: OS | 操作系统
options:
- Please choose | 请选择
- Windows
- Mac
- Linux
- Docker
validations:
required: true
- type: textarea
id: describe
attributes:
label: Describe the bug | 简述
description: Describe the bug | 简述
validations:
required: true
- type: textarea
id: screenshot
attributes:
label: Screen Shot | 有帮助的截图
description: Screen Shot | 有帮助的截图
validations:
required: true
- type: textarea
id: traceback
attributes:
label: Terminal Traceback & Material to Help Reproduce Bugs | 终端traceback如有 + 帮助我们复现的测试材料样本(如有)
description: Terminal Traceback & Material to Help Reproduce Bugs | 终端traceback如有 + 帮助我们复现的测试材料样本(如有)

查看文件

@@ -1,10 +0,0 @@
---
name: Feature request
about: Suggest an idea for this project
title: ''
labels: ''
assignees: ''
---

查看文件

@@ -0,0 +1,28 @@
name: Feature Request | 功能请求
description: "Feature Request"
title: "[Feature]: "
labels: []
body:
- type: dropdown
id: download
attributes:
label: Class | 类型
options:
- Please choose | 请选择
- 其他
- 函数插件
- 大语言模型
- 程序主体
validations:
required: false
- type: textarea
id: traceback
attributes:
label: Feature Request | 功能请求
description: Feature Request | 功能请求

查看文件

@@ -0,0 +1,44 @@
# https://docs.github.com/en/actions/publishing-packages/publishing-docker-images#publishing-images-to-github-packages
name: build-with-all-capacity
on:
push:
branches:
- 'master'
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}_with_all_capacity
jobs:
build-and-push-image:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Log in to the Container registry
uses: docker/login-action@v2
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v4
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
- name: Build and push Docker image
uses: docker/build-push-action@v4
with:
context: .
push: true
file: docs/GithubAction+AllCapacity
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}

查看文件

@@ -0,0 +1,44 @@
# https://docs.github.com/en/actions/publishing-packages/publishing-docker-images#publishing-images-to-github-packages
name: build-with-audio-assistant
on:
push:
branches:
- 'master'
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}_audio_assistant
jobs:
build-and-push-image:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Log in to the Container registry
uses: docker/login-action@v2
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v4
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
- name: Build and push Docker image
uses: docker/build-push-action@v4
with:
context: .
push: true
file: docs/GithubAction+NoLocal+AudioAssistant
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}

44
.github/workflows/build-with-chatglm.yml vendored 普通文件
查看文件

@@ -0,0 +1,44 @@
# https://docs.github.com/en/actions/publishing-packages/publishing-docker-images#publishing-images-to-github-packages
name: build-with-chatglm
on:
push:
branches:
- 'master'
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}_chatglm_moss
jobs:
build-and-push-image:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Log in to the Container registry
uses: docker/login-action@v2
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v4
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
- name: Build and push Docker image
uses: docker/build-push-action@v4
with:
context: .
push: true
file: docs/GithubAction+ChatGLM+Moss
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}

查看文件

@@ -0,0 +1,44 @@
# https://docs.github.com/en/actions/publishing-packages/publishing-docker-images#publishing-images-to-github-packages
name: build-with-jittorllms
on:
push:
branches:
- 'master'
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}_jittorllms
jobs:
build-and-push-image:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Log in to the Container registry
uses: docker/login-action@v2
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v4
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
- name: Build and push Docker image
uses: docker/build-push-action@v4
with:
context: .
push: true
file: docs/GithubAction+JittorLLMs
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}

44
.github/workflows/build-with-latex.yml vendored 普通文件
查看文件

@@ -0,0 +1,44 @@
# https://docs.github.com/en/actions/publishing-packages/publishing-docker-images#publishing-images-to-github-packages
name: build-with-latex
on:
push:
branches:
- 'master'
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}_with_latex
jobs:
build-and-push-image:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Log in to the Container registry
uses: docker/login-action@v2
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v4
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
- name: Build and push Docker image
uses: docker/build-push-action@v4
with:
context: .
push: true
file: docs/GithubAction+NoLocal+Latex
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}

查看文件

@@ -0,0 +1,44 @@
# https://docs.github.com/en/actions/publishing-packages/publishing-docker-images#publishing-images-to-github-packages
name: build-without-local-llms
on:
push:
branches:
- 'master'
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}_nolocal
jobs:
build-and-push-image:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Log in to the Container registry
uses: docker/login-action@v2
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v4
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
- name: Build and push Docker image
uses: docker/build-push-action@v4
with:
context: .
push: true
file: docs/GithubAction+NoLocal
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}

25
.github/workflows/stale.yml vendored 普通文件
查看文件

@@ -0,0 +1,25 @@
# This workflow warns and then closes issues and PRs that have had no activity for a specified amount of time.
#
# You can adjust the behavior by modifying this file.
# For more information, see:
# https://github.com/actions/stale
name: 'Close stale issues and PRs'
on:
schedule:
- cron: '*/5 * * * *'
jobs:
stale:
runs-on: ubuntu-latest
permissions:
issues: write
pull-requests: read
steps:
- uses: actions/stale@v8
with:
stale-issue-message: 'This issue is stale because it has been open 100 days with no activity. Remove stale label or comment or this will be closed in 1 days.'
days-before-stale: 100
days-before-close: 1
debug-only: true

7
.gitignore vendored
查看文件

@@ -146,4 +146,9 @@ debug*
private*
crazy_functions/test_project/pdf_and_word
crazy_functions/test_samples
request_llm/jittorllms
request_llm/jittorllms
multi-language
request_llm/moss
media
flagged
request_llm/ChatGLM-6b-onnx-u8s8

查看文件

@@ -1,20 +1,34 @@
# 此Dockerfile适用于“无本地模型”的环境构建,如果需要使用chatglm等本地模型,请参考 docs/Dockerfile+ChatGLM
# 如何构建: 先修改 `config.py`, 然后 docker build -t gpt-academic .
# 如何运行: docker run --rm -it --net=host gpt-academic
# 此Dockerfile适用于“无本地模型”的环境构建,如果需要使用chatglm等本地模型或者latex运行依赖,请参考 docker-compose.yml
# 如何构建: 先修改 `config.py`, 然后 `docker build -t gpt-academic . `
# 如何运行(Linux下): `docker run --rm -it --net=host gpt-academic `
# 如何运行(其他操作系统,选择任意一个固定端口50923): `docker run --rm -it -e WEB_PORT=50923 -p 50923:50923 gpt-academic `
FROM python:3.11
# 非必要步骤,更换pip源
RUN echo '[global]' > /etc/pip.conf && \
echo 'index-url = https://mirrors.aliyun.com/pypi/simple/' >> /etc/pip.conf && \
echo 'trusted-host = mirrors.aliyun.com' >> /etc/pip.conf
# 进入工作路径
WORKDIR /gpt
COPY requirements.txt .
# 安装大部分依赖,利用Docker缓存加速以后的构建
COPY requirements.txt ./
COPY ./docs/gradio-3.32.2-py3-none-any.whl ./docs/gradio-3.32.2-py3-none-any.whl
RUN pip3 install -r requirements.txt
COPY . .
# 可选步骤,用于预热模块
# 装载项目文件,安装剩余依赖
COPY . .
RUN pip3 install -r requirements.txt
# 非必要步骤,用于预热模块
RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'
# 启动
CMD ["python3", "-u", "main.py"]

305
README.md
查看文件

@@ -1,51 +1,61 @@
> **Note**
>
> 安装依赖时,请严格选择requirements.txt中**指定的版本**
>
> `pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/`
> 2023.7.8: Gradio, Pydantic依赖调整,已修改 `requirements.txt`。请及时**更新代码**,安装依赖时,请严格选择`requirements.txt`中**指定的版本**
>
> `pip install -r requirements.txt`
# <img src="docs/logo.png" width="40" > GPT 学术优化 (GPT Academic)
**如果喜欢这个项目,请给它一个Star;如果你发明了更好用的快捷键或函数插件,欢迎发pull requests**
# <div align=center><img src="docs/logo.png" width="40"> GPT 学术优化 (GPT Academic)</div>
**如果喜欢这个项目,请给它一个Star;如果您发明了好用的快捷键或函数插件,欢迎发pull requests**
If you like this project, please give it a Star. If you've come up with more useful academic shortcuts or functional plugins, feel free to open an issue or pull request. We also have a README in [English|](docs/README_EN.md)[日本語|](docs/README_JP.md)[한국어|](https://github.com/mldljyh/ko_gpt_academic)[Русский|](docs/README_RS.md)[Français](docs/README_FR.md) translated by this project itself.
To translate this project to arbitrary language with GPT, read and run [`multi_language.py`](multi_language.py) (experimental).
> **Note**
>
> 1.请注意只有**红颜色**标识的函数插件(按钮)才支持读取文件,部分插件位于插件区的**下拉菜单**中。另外我们以**最高优先级**欢迎和处理任何新插件的PR
> 1.请注意只有 **高亮** 标识的函数插件(按钮)才支持读取文件,部分插件位于插件区的**下拉菜单**中。另外我们以**最高优先级**欢迎和处理任何新插件的PR
>
> 2.本项目中每个文件的功能都在自译解[`self_analysis.md`](https://github.com/binary-husky/chatgpt_academic/wiki/chatgpt-academic%E9%A1%B9%E7%9B%AE%E8%87%AA%E8%AF%91%E8%A7%A3%E6%8A%A5%E5%91%8A)详细说明。随着版本的迭代,您也可以随时自行点击相关函数插件,调用GPT重新生成项目的自我解析报告。常见问题汇总在[`wiki`](https://github.com/binary-husky/chatgpt_academic/wiki/%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98)当中
> 2.本项目中每个文件的功能都在[自译解报告`self_analysis.md`](https://github.com/binary-husky/gpt_academic/wiki/GPTAcademic项目自译解报告)详细说明。随着版本的迭代,您也可以随时自行点击相关函数插件,调用GPT重新生成项目的自我解析报告。常见问题[`wiki`](https://github.com/binary-husky/gpt_academic/wiki)。[安装方法](#installation) | [配置说明](https://github.com/binary-husky/gpt_academic/wiki/%E9%A1%B9%E7%9B%AE%E9%85%8D%E7%BD%AE%E8%AF%B4%E6%98%8E)
>
> 3.本项目兼容并鼓励尝试国产大语言模型chatglm和RWKV, 盘古等等。支持OpenAI和API2D的api-key共存,可在配置文件中填写如`API_KEY="openai-key1,openai-key2,api2d-key3"`。需要临时更换`API_KEY`时,在输入区输入临时的`API_KEY`然后回车键提交后即可生效。
> 3.本项目兼容并鼓励尝试国产大语言模型ChatGLM和Moss等等。支持多个api-key共存,可在配置文件中填写如`API_KEY="openai-key1,openai-key2,azure-key3,api2d-key4"`。需要临时更换`API_KEY`时,在输入区输入临时的`API_KEY`然后回车键提交后即可生效。
<div align="center">
功能 | 描述
功能(⭐= 近期新增功能) | 描述
--- | ---
⭐[接入新模型](https://github.com/binary-husky/gpt_academic/wiki/%E5%A6%82%E4%BD%95%E5%88%87%E6%8D%A2%E6%A8%A1%E5%9E%8B) | 百度[千帆](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/Nlks5zkzu)与文心一言, [通义千问](https://modelscope.cn/models/qwen/Qwen-7B-Chat/summary),上海AI-Lab[书生](https://github.com/InternLM/InternLM),讯飞[星火](https://xinghuo.xfyun.cn/),[LLaMa2](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
一键润色 | 支持一键润色、一键查找论文语法错误
一键中英互译 | 一键中英互译
一键代码解释 | 显示代码、解释代码、生成代码、给代码加注释
[自定义快捷键](https://www.bilibili.com/video/BV14s4y1E7jN) | 支持自定义快捷键
模块化设计 | 支持自定义强大的[函数插件](https://github.com/binary-husky/chatgpt_academic/tree/master/crazy_functions),插件支持[热更新](https://github.com/binary-husky/chatgpt_academic/wiki/%E5%87%BD%E6%95%B0%E6%8F%92%E4%BB%B6%E6%8C%87%E5%8D%97)
[自我程序剖析](https://www.bilibili.com/video/BV1cj411A7VW) | [函数插件] [一键读懂](https://github.com/binary-husky/chatgpt_academic/wiki/chatgpt-academic%E9%A1%B9%E7%9B%AE%E8%87%AA%E8%AF%91%E8%A7%A3%E6%8A%A5%E5%91%8A)本项目的源代码
模块化设计 | 支持自定义强大的[函数插件](https://github.com/binary-husky/gpt_academic/tree/master/crazy_functions),插件支持[热更新](https://github.com/binary-husky/gpt_academic/wiki/%E5%87%BD%E6%95%B0%E6%8F%92%E4%BB%B6%E6%8C%87%E5%8D%97)
[自我程序剖析](https://www.bilibili.com/video/BV1cj411A7VW) | [函数插件] [一键读懂](https://github.com/binary-husky/gpt_academic/wiki/chatgpt-academic%E9%A1%B9%E7%9B%AE%E8%87%AA%E8%AF%91%E8%A7%A3%E6%8A%A5%E5%91%8A)本项目的源代码
[程序剖析](https://www.bilibili.com/video/BV1cj411A7VW) | [函数插件] 一键可以剖析其他Python/C/C++/Java/Lua/...项目树
读论文、[翻译](https://www.bilibili.com/video/BV1KT411x7Wn)论文 | [函数插件] 一键解读latex/pdf论文全文并生成摘要
Latex全文[翻译](https://www.bilibili.com/video/BV1nk4y1Y7Js/)、[润色](https://www.bilibili.com/video/BV1FT411H7c5/) | [函数插件] 一键翻译或润色latex论文
批量注释生成 | [函数插件] 一键批量生成函数注释
Markdown[中英互译](https://www.bilibili.com/video/BV1yo4y157jV/) | [函数插件] 看到上面5种语言的[README](https://github.com/binary-husky/chatgpt_academic/blob/master/docs/README_EN.md)了吗?
Markdown[中英互译](https://www.bilibili.com/video/BV1yo4y157jV/) | [函数插件] 看到上面5种语言的[README](https://github.com/binary-husky/gpt_academic/blob/master/docs/README_EN.md)了吗?
chat分析报告生成 | [函数插件] 运行后自动生成总结汇报
[PDF论文全文翻译功能](https://www.bilibili.com/video/BV1KT411x7Wn) | [函数插件] PDF论文提取题目&摘要+翻译全文(多线程)
[Arxiv小助手](https://www.bilibili.com/video/BV1LM4y1279X) | [函数插件] 输入arxiv文章url即可一键翻译摘要+下载PDF
Latex论文一键校对 | [函数插件] 仿Grammarly对Latex文章进行语法、拼写纠错+输出对照PDF
[谷歌学术统合小助手](https://www.bilibili.com/video/BV19L411U7ia) | [函数插件] 给定任意谷歌学术搜索页面URL,让gpt帮你[写relatedworks](https://www.bilibili.com/video/BV1GP411U7Az/)
互联网信息聚合+GPT | [函数插件] 一键[让GPT从互联网获取信息](https://www.bilibili.com/video/BV1om4y127ck),再回答问题,让信息永不过时
互联网信息聚合+GPT | [函数插件] 一键[让GPT从互联网获取信息](https://www.bilibili.com/video/BV1om4y127ck)回答问题,让信息永不过时
⭐Arxiv论文精细翻译 ([Docker](https://github.com/binary-husky/gpt_academic/pkgs/container/gpt_academic_with_latex)) | [函数插件] 一键[以超高质量翻译arxiv论文](https://www.bilibili.com/video/BV1dz4y1v77A/),目前最好的论文翻译工具
⭐[实时语音对话输入](https://github.com/binary-husky/gpt_academic/blob/master/docs/use_audio.md) | [函数插件] 异步[监听音频](https://www.bilibili.com/video/BV1AV4y187Uy/),自动断句,自动寻找回答时机
公式/图片/表格显示 | 可以同时显示公式的[tex形式和渲染形式](https://user-images.githubusercontent.com/96192199/230598842-1d7fcddd-815d-40ee-af60-baf488a199df.png),支持公式、代码高亮
多线程函数插件支持 | 支持多线调用chatgpt,一键处理[海量文本](https://www.bilibili.com/video/BV1FT411H7c5/)或程序
启动暗色gradio[主题](https://github.com/binary-husky/chatgpt_academic/issues/173) | 在浏览器url后面添加```/?__dark-theme=true```可以切换dark主题
[多LLM模型](https://www.bilibili.com/video/BV1wT411p7yf)支持,[API2D](https://api2d.com/)接口支持 | 同时被GPT3.5、GPT4[清华ChatGLM](https://github.com/THUDM/ChatGLM-6B)伺候的感觉一定会很不错吧?
更多LLM模型接入,支持[huggingface部署](https://huggingface.co/spaces/qingxu98/gpt-academic) | 新加入Newbing测试接口(新必应AI)
…… | ……
启动暗色[主题](https://github.com/binary-husky/gpt_academic/issues/173) | 在浏览器url后面添加```/?__theme=dark```可以切换dark主题
[多LLM模型](https://www.bilibili.com/video/BV1wT411p7yf)支持 | 同时被GPT3.5、GPT4[清华ChatGLM2](https://github.com/THUDM/ChatGLM2-6B)、[复旦MOSS](https://github.com/OpenLMLab/MOSS)同时伺候的感觉一定会很不错吧?
⭐ChatGLM2微调模型 | 支持加载ChatGLM2微调模型,提供ChatGLM2微调辅助插件
更多LLM模型接入,支持[huggingface部署](https://huggingface.co/spaces/qingxu98/gpt-academic) | 加入Newbing接口(新必应),引入清华[Jittorllms](https://github.com/Jittor/JittorLLMs)支持[LLaMA](https://github.com/facebookresearch/llama)和[盘古α](https://openi.org.cn/pangu/)
⭐[void-terminal](https://github.com/binary-husky/void-terminal) pip包 | 脱离GUI,在Python中直接调用本项目的所有函数插件开发中
⭐虚空终端插件 | [函数插件] 用自然语言,直接调度本项目其他插件
更多新功能展示 (图像生成等) …… | 见本文档结尾处 ……
</div>
@@ -80,115 +90,132 @@ chat分析报告生成 | [函数插件] 运行后自动生成总结汇报
<img src="https://user-images.githubusercontent.com/96192199/232537274-deca0563-7aa6-4b5d-94a2-b7c453c47794.png" width="700" >
</div>
---
## 安装-方法1直接运行 (Windows, Linux or MacOS)
# Installation
### 安装方法I直接运行 (Windows, Linux or MacOS)
1. 下载项目
```sh
git clone https://github.com/binary-husky/chatgpt_academic.git
cd chatgpt_academic
git clone --depth=1 https://github.com/binary-husky/gpt_academic.git
cd gpt_academic
```
2. 配置API_KEY
在`config.py`中,配置API KEY等设置,[特殊网络环境设置](https://github.com/binary-husky/gpt_academic/issues/1) 。
在`config.py`中,配置API KEY等设置,[点击查看特殊网络环境设置方法](https://github.com/binary-husky/gpt_academic/issues/1) 。[Wiki页面](https://github.com/binary-husky/gpt_academic/wiki/%E9%A1%B9%E7%9B%AE%E9%85%8D%E7%BD%AE%E8%AF%B4%E6%98%8E)。
P.S. 程序运行时会优先检查是否存在名为`config_private.py`的私密配置文件,并用其中的配置覆盖`config.py`的同名配置。因此,如果您能理解我们的配置读取逻辑,我们强烈建议您在`config.py`旁边创建一个名为`config_private.py`的新配置文件,并把`config.py`中的配置转移(复制)到`config_private.py`中。`config_private.py`不受git管控,可以让您的隐私信息更加安全。
「 程序会优先检查是否存在名为`config_private.py`的私密配置文件,并用其中的配置覆盖`config.py`的同名配置。如您能理解该读取逻辑,我们强烈建议您在`config.py`旁边创建一个名为`config_private.py`的新配置文件,并把`config.py`中的配置转移(复制)到`config_private.py`中(仅复制您修改过的配置条目即可)。 」
「 支持通过`环境变量`配置项目,环境变量的书写格式参考`docker-compose.yml`文件或者我们的[Wiki页面](https://github.com/binary-husky/gpt_academic/wiki/%E9%A1%B9%E7%9B%AE%E9%85%8D%E7%BD%AE%E8%AF%B4%E6%98%8E)。配置读取优先级: `环境变量` > `config_private.py` > `config.py`。 」
3. 安装依赖
```sh
# 选择I: 如熟悉pythonpython版本3.9以上,越新越好)
# 选择I: 如熟悉pythonpython版本3.9以上,越新越好),备注使用官方pip源或者阿里pip源,临时换源方法python -m pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/
python -m pip install -r requirements.txt
# 备注使用官方pip源或者阿里pip源,其他pip源如一些大学的pip有可能出问题,临时换源方法python -m pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/
# 选择II: 如不熟悉python使用anaconda步骤也是类似的:
# II-1conda create -n gptac_venv python=3.11
# II-2conda activate gptac_venv
# II-3python -m pip install -r requirements.txt
# 选择II: 使用Anaconda步骤也是类似的 (https://www.bilibili.com/video/BV1rc411W7Dr)
conda create -n gptac_venv python=3.11 # 创建anaconda环境
conda activate gptac_venv # 激活anaconda环境
python -m pip install -r requirements.txt # 这个步骤和pip安装一样的步骤
```
如果需要支持清华ChatGLM后端,需要额外安装更多依赖前提条件熟悉python + 电脑配置够强):
<details><summary>如果需要支持清华ChatGLM2/复旦MOSS/RWKV作为后端,请点击展开此处</summary>
<p>
【可选步骤】如果需要支持清华ChatGLM2/复旦MOSS作为后端,需要额外安装更多依赖前提条件熟悉Python + 用过Pytorch + 电脑配置够强):
```sh
python -m pip install -r request_llm/requirements_chatglm.txt
# 【可选步骤I】支持清华ChatGLM2。清华ChatGLM备注如果遇到"Call ChatGLM fail 不能正常加载ChatGLM的参数" 错误,参考如下: 1以上默认安装的为torch+cpu版,使用cuda需要卸载torch重新安装torch+cuda; 2如因本机配置不够无法加载模型,可以修改request_llm/bridge_chatglm.py中的模型精度, 将 AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True) 都修改为 AutoTokenizer.from_pretrained("THUDM/chatglm-6b-int4", trust_remote_code=True)
python -m pip install -r request_llm/requirements_chatglm.txt
# 备注:如果遇到"Call ChatGLM fail 不能正常加载ChatGLM的参数" 错误,参考如下:
# 1以上默认安装的为torch+cpu版,使用cuda需要卸载torch重新安装torch+cuda
# 2如因本机配置不够无法加载模型,可以修改request_llm/bridge_chatglm.py中的模型精度, 将 AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True) 都修改为 AutoTokenizer.from_pretrained("THUDM/chatglm-6b-int4", trust_remote_code=True)
# 【可选步骤II】支持复旦MOSS
python -m pip install -r request_llm/requirements_moss.txt
git clone --depth=1 https://github.com/OpenLMLab/MOSS.git request_llm/moss # 注意执行此行代码时,必须处于项目根路径
# 【可选步骤III】支持RWKV Runner
参考wikihttps://github.com/binary-husky/gpt_academic/wiki/%E9%80%82%E9%85%8DRWKV-Runner
# 【可选步骤IV】确保config.py配置文件的AVAIL_LLM_MODELS包含了期望的模型,目前支持的全部模型如下(jittorllms系列目前仅支持docker方案)
AVAIL_LLM_MODELS = ["gpt-3.5-turbo", "api2d-gpt-3.5-turbo", "gpt-4", "api2d-gpt-4", "chatglm", "newbing", "moss"] # + ["jittorllms_rwkv", "jittorllms_pangualpha", "jittorllms_llama"]
```
</p>
</details>
4. 运行
```sh
python main.py
```
5. 测试函数插件
```
- 测试函数插件模板函数要求gpt回答历史上的今天发生了什么,您可以根据此函数为模板,实现更复杂的功能
点击 "[函数插件模板Demo] 历史上的今天"
```
### 安装方法II使用Docker
## 安装-方法2使用Docker
1. 仅ChatGPT推荐大多数人选择
0. 部署项目的全部能力这个是包含cuda和latex的大型镜像。如果您网速慢、硬盘小或没有显卡,则不推荐使用这个,建议使用方案1需要熟悉[Nvidia Docker](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#installing-on-ubuntu-and-debian)运行时)
[![fullcapacity](https://github.com/binary-husky/gpt_academic/actions/workflows/build-with-all-capacity.yml/badge.svg?branch=master)](https://github.com/binary-husky/gpt_academic/actions/workflows/build-with-audio-assistant.yml)
``` sh
# 下载项目
git clone https://github.com/binary-husky/chatgpt_academic.git
cd chatgpt_academic
# 配置 “Proxy”, “API_KEY” 以及 “WEB_PORT” (例如50923) 等
用任意文本编辑器编辑 config.py
# 安装
docker build -t gpt-academic .
#(最后一步-选择1在Linux环境下,用`--net=host`更方便快捷
docker run --rm -it --net=host gpt-academic
#(最后一步-选择2在macOS/windows环境下,只能用-p选项将容器上的端口(例如50923)暴露给主机上的端口
docker run --rm -it -p 50923:50923 gpt-academic
# 修改docker-compose.yml,保留方案0并删除其他方案。修改docker-compose.yml中方案0的配置,参考其中注释即可
docker-compose up
```
2. ChatGPT+ChatGLM需要对Docker熟悉 + 读懂Dockerfile + 电脑配置够强
1. ChatGPT+文心一言+spark等在线模型推荐大多数人选择
[![basic](https://github.com/binary-husky/gpt_academic/actions/workflows/build-without-local-llms.yml/badge.svg?branch=master)](https://github.com/binary-husky/gpt_academic/actions/workflows/build-without-local-llms.yml)
[![basiclatex](https://github.com/binary-husky/gpt_academic/actions/workflows/build-with-latex.yml/badge.svg?branch=master)](https://github.com/binary-husky/gpt_academic/actions/workflows/build-with-latex.yml)
[![basicaudio](https://github.com/binary-husky/gpt_academic/actions/workflows/build-with-audio-assistant.yml/badge.svg?branch=master)](https://github.com/binary-husky/gpt_academic/actions/workflows/build-with-audio-assistant.yml)
``` sh
# 修改Dockerfile
cd docs && nano Dockerfile+ChatGLM
# 构建 Dockerfile+ChatGLM在docs路径下,请先cd docs
docker build -t gpt-academic --network=host -f Dockerfile+ChatGLM .
# 运行 (1) 直接运行:
docker run --rm -it --net=host --gpus=all gpt-academic
# 运行 (2) 我想运行之前进容器做一些调整:
docker run --rm -it --net=host --gpus=all gpt-academic bash
# 修改docker-compose.yml,保留方案1并删除其他方案。修改docker-compose.yml中方案1的配置,参考其中注释即可
docker-compose up
```
3. ChatGPT + LLAMA + 盘古 + RWKV需要精通Docker
P.S. 如果需要依赖Latex的插件功能,请见Wiki。另外,您也可以直接使用方案4或者方案0获取Latex功能。
2. ChatGPT + ChatGLM2 + MOSS + LLAMA2 + 通义千问(需要熟悉[Nvidia Docker](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#installing-on-ubuntu-and-debian)运行时)
[![chatglm](https://github.com/binary-husky/gpt_academic/actions/workflows/build-with-chatglm.yml/badge.svg?branch=master)](https://github.com/binary-husky/gpt_academic/actions/workflows/build-with-chatglm.yml)
``` sh
1. 修改docker-compose.yml,删除方案一和方案二,保留方案三基于jittor
2. 修改docker-compose.yml中方案三的配置,参考其中注释即可
3. 终端运行 docker-compose up
# 修改docker-compose.yml,保留方案2并删除其他方案。修改docker-compose.yml中方案2的配置,参考其中注释即可
docker-compose up
```
3. ChatGPT + LLAMA + 盘古 + RWKV需要熟悉[Nvidia Docker](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#installing-on-ubuntu-and-debian)运行时)
[![jittorllms](https://github.com/binary-husky/gpt_academic/actions/workflows/build-with-jittorllms.yml/badge.svg?branch=master)](https://github.com/binary-husky/gpt_academic/actions/workflows/build-with-jittorllms.yml)
``` sh
# 修改docker-compose.yml,保留方案3并删除其他方案。修改docker-compose.yml中方案3的配置,参考其中注释即可
docker-compose up
```
## 安装-方法3:其他部署姿势
### 安装方法III:其他部署姿势
1. 一键运行脚本。
完全不熟悉python环境的Windows用户可以下载[Release](https://github.com/binary-husky/gpt_academic/releases)中发布的一键运行脚本安装无本地模型的版本。
脚本的贡献来源是[oobabooga](https://github.com/oobabooga/one-click-installers)。
1. 如何使用反代URL/微软云AzureAPI
2. 使用docker-compose运行。
请阅读docker-compose.yml后,按照其中的提示操作即可
3. 如何使用反代URL
按照`config.py`中的说明配置API_URL_REDIRECT即可。
2. 远程云服务器部署(需要云服务器知识与经验)
请访问[部署wiki-1](https://github.com/binary-husky/chatgpt_academic/wiki/%E4%BA%91%E6%9C%8D%E5%8A%A1%E5%99%A8%E8%BF%9C%E7%A8%8B%E9%83%A8%E7%BD%B2%E6%8C%87%E5%8D%97)
4. 微软云AzureAPI
按照`config.py`中的说明配置即可AZURE_ENDPOINT等四个配置
3. 使用WSL2Windows Subsystem for Linux 子系统)
请访问[部署wiki-2](https://github.com/binary-husky/chatgpt_academic/wiki/%E4%BD%BF%E7%94%A8WSL2%EF%BC%88Windows-Subsystem-for-Linux-%E5%AD%90%E7%B3%BB%E7%BB%9F%EF%BC%89%E9%83%A8%E7%BD%B2)
5. 远程云服务器部署(需要云服务器知识与经验)。
请访问[部署wiki-1](https://github.com/binary-husky/gpt_academic/wiki/%E4%BA%91%E6%9C%8D%E5%8A%A1%E5%99%A8%E8%BF%9C%E7%A8%8B%E9%83%A8%E7%BD%B2%E6%8C%87%E5%8D%97)
4. 如何在二级网址(如`http://localhost/subpath`)下运行
6. 使用Sealos[一键部署](https://github.com/binary-husky/gpt_academic/issues/993)。
7. 使用WSL2Windows Subsystem for Linux 子系统)。
请访问[部署wiki-2](https://github.com/binary-husky/gpt_academic/wiki/%E4%BD%BF%E7%94%A8WSL2%EF%BC%88Windows-Subsystem-for-Linux-%E5%AD%90%E7%B3%BB%E7%BB%9F%EF%BC%89%E9%83%A8%E7%BD%B2)
8. 如何在二级网址(如`http://localhost/subpath`)下运行。
请访问[FastAPI运行说明](docs/WithFastapi.md)
5. 使用docker-compose运行
请阅读docker-compose.yml后,按照其中的提示操作即可
---
## 自定义新的便捷按钮 / 自定义函数插件
1. 自定义新的便捷按钮(学术快捷键)
# Advanced Usage
### I自定义新的便捷按钮学术快捷键
任意文本编辑器打开`core_functional.py`,添加条目如下,然后重启程序即可。(如果按钮已经添加成功并可见,那么前缀、后缀都支持热修改,无需重启程序即可生效。)
例如
```
@@ -204,53 +231,93 @@ docker run --rm -it --net=host --gpus=all gpt-academic bash
<img src="https://user-images.githubusercontent.com/96192199/226899272-477c2134-ed71-4326-810c-29891fe4a508.png" width="500" >
</div>
2. 自定义函数插件
### II自定义函数插件
编写强大的函数插件来执行任何你想得到的和想不到的任务。
本项目的插件编写、调试难度很低,只要您具备一定的python基础知识,就可以仿照我们提供的模板实现自己的插件功能。
详情请参考[函数插件指南](https://github.com/binary-husky/chatgpt_academic/wiki/%E5%87%BD%E6%95%B0%E6%8F%92%E4%BB%B6%E6%8C%87%E5%8D%97)。
详情请参考[函数插件指南](https://github.com/binary-husky/gpt_academic/wiki/%E5%87%BD%E6%95%B0%E6%8F%92%E4%BB%B6%E6%8C%87%E5%8D%97)。
---
## 其他功能说明
# Latest Update
### I新功能动态
1. 对话保存功能。在函数插件区调用 `保存当前的对话` 即可将当前对话保存为可读+可复原的html文件,如图:
1. 对话保存功能。在函数插件区调用 `保存当前的对话` 即可将当前对话保存为可读+可复原的html文件,
另外在函数插件区(下拉菜单)调用 `载入对话历史存档` ,即可还原之前的会话。
Tip不指定文件直接点击 `载入对话历史存档` 可以查看历史html存档缓存。
<div align="center">
<img src="https://user-images.githubusercontent.com/96192199/235222390-24a9acc0-680f-49f5-bc81-2f3161f1e049.png" width="500" >
</div>
在函数插件区(下拉菜单)调用 `载入对话历史存档` ,即可还原之前的会话。
2. 生成报告。大部分插件都会在执行结束后,生成工作报告
2. ⭐Latex/Arxiv论文翻译功能⭐
<div align="center">
<img src="https://user-images.githubusercontent.com/96192199/227503770-fe29ce2c-53fd-47b0-b0ff-93805f0c2ff4.png" height="300" >
<img src="https://user-images.githubusercontent.com/96192199/227504617-7a497bb3-0a2a-4b50-9a8a-95ae60ea7afd.png" height="300" >
<img src="https://user-images.githubusercontent.com/96192199/227504005-efeaefe0-b687-49d0-bf95-2d7b7e66c348.png" height="300" >
<img src="https://github.com/binary-husky/gpt_academic/assets/96192199/002a1a75-ace0-4e6a-94e2-ec1406a746f1" height="250" > ===>
<img src="https://github.com/binary-husky/gpt_academic/assets/96192199/9fdcc391-f823-464f-9322-f8719677043b" height="250" >
</div>
3. 模块化功能设计,简单的接口却能支持强大的功能
3. 虚空终端(从自然语言输入中,理解用户意图+自动调用其他插件)
- 步骤一:输入 “ 请调用插件翻译PDF论文,地址为https://openreview.net/pdf?id=rJl0r3R9KX ”
- 步骤二:点击“虚空终端”
<div align="center">
<img src="https://github.com/binary-husky/gpt_academic/assets/96192199/66f1b044-e9ff-4eed-9126-5d4f3668f1ed" width="500" >
</div>
4. 模块化功能设计,简单的接口却能支持强大的功能
<div align="center">
<img src="https://user-images.githubusercontent.com/96192199/229288270-093643c1-0018-487a-81e6-1d7809b6e90f.png" height="400" >
<img src="https://user-images.githubusercontent.com/96192199/227504931-19955f78-45cd-4d1c-adac-e71e50957915.png" height="400" >
</div>
4. 这是一个能够“自我译解”的开源项目
5. 译解其他开源项目
<div align="center">
<img src="https://user-images.githubusercontent.com/96192199/226936850-c77d7183-0749-4c1c-9875-fd4891842d0c.png" width="500" >
<img src="https://user-images.githubusercontent.com/96192199/226935232-6b6a73ce-8900-4aee-93f9-733c7e6fef53.png" height="250" >
<img src="https://user-images.githubusercontent.com/96192199/226969067-968a27c1-1b9c-486b-8b81-ab2de8d3f88a.png" height="250" >
</div>
5. 译解其他开源项目,不在话下
6. 装饰[live2d](https://github.com/fghrsh/live2d_demo)的小功能(默认关闭,需要修改`config.py`
<div align="center">
<img src="https://user-images.githubusercontent.com/96192199/226935232-6b6a73ce-8900-4aee-93f9-733c7e6fef53.png" width="500" >
<img src="https://user-images.githubusercontent.com/96192199/236432361-67739153-73e8-43fe-8111-b61296edabd9.png" width="500" >
</div>
7. 新增MOSS大语言模型支持
<div align="center">
<img src="https://user-images.githubusercontent.com/96192199/226969067-968a27c1-1b9c-486b-8b81-ab2de8d3f88a.png" width="500" >
<img src="https://user-images.githubusercontent.com/96192199/236639178-92836f37-13af-4fdd-984d-b4450fe30336.png" width="500" >
</div>
## 版本:
- version 3.5(Todo): 使用自然语言调用本项目的所有函数插件(高优先级)
- version 3.4(Todo): 完善chatglm本地大模型的多线支持
8. OpenAI图像生成
<div align="center">
<img src="https://github.com/binary-husky/gpt_academic/assets/96192199/bc7ab234-ad90-48a0-8d62-f703d9e74665" width="500" >
</div>
9. OpenAI音频解析与总结
<div align="center">
<img src="https://github.com/binary-husky/gpt_academic/assets/96192199/709ccf95-3aee-498a-934a-e1c22d3d5d5b" width="500" >
</div>
10. Latex全文校对纠错
<div align="center">
<img src="https://github.com/binary-husky/gpt_academic/assets/96192199/651ccd98-02c9-4464-91e1-77a6b7d1b033" height="200" > ===>
<img src="https://github.com/binary-husky/gpt_academic/assets/96192199/476f66d9-7716-4537-b5c1-735372c25adb" height="200">
</div>
11. 语言、主题切换
<div align="center">
<img src="https://github.com/binary-husky/gpt_academic/assets/96192199/b6799499-b6fb-4f0c-9c8e-1b441872f4e8" width="500" >
</div>
### II版本:
- version 3.60todo: 优化虚空终端,引入code interpreter和更多插件
- version 3.53: 支持动态选择不同界面主题,提高稳定性&解决多用户冲突问题
- version 3.50: 使用自然语言调用本项目的所有函数插件虚空终端,支持插件分类,改进UI,设计新主题
- version 3.49: 支持百度千帆平台和文心一言
- version 3.48: 支持阿里达摩院通义千问,上海AI-Lab书生,讯飞星火
- version 3.46: 支持完全脱手操作的实时语音对话
- version 3.45: 支持自定义ChatGLM2微调模型
- version 3.44: 正式支持Azure,优化界面易用性
- version 3.4: +arxiv论文翻译、latex论文批改功能
- version 3.3: +互联网信息综合功能
- version 3.2: 函数插件支持更多参数接口 (保存对话功能, 解读任意语言代码+同时询问任意的LLM组合)
- version 3.1: 支持同时问询多个gpt模型支持api2d,支持多个apikey负载均衡
@@ -264,17 +331,41 @@ docker run --rm -it --net=host --gpus=all gpt-academic bash
- version 2.0: 引入模块化函数插件
- version 1.0: 基础功能
gpt_academic开发者QQ群734063350
gpt_academic开发者QQ群-2610599535
- 已知问题
- 某些浏览器翻译插件干扰此软件前端的运行
- 官方Gradio目前有很多兼容性Bug,请务必使用`requirement.txt`安装Gradio
### III主题
可以通过修改`THEME`选项config.py变更主题
1. `Chuanhu-Small-and-Beautiful` [网址](https://github.com/GaiZhenbiao/ChuanhuChatGPT/)
## 参考与学习
### IV参考与学习
```
代码中参考了很多其他优秀项目中的设计,主要包括
代码中参考了很多其他优秀项目中的设计,顺序不分先后
# 借鉴项目1借鉴了ChuanhuChatGPT中诸多技巧
# 清华ChatGLM2-6B:
https://github.com/THUDM/ChatGLM2-6B
# 清华JittorLLMs:
https://github.com/Jittor/JittorLLMs
# ChatPaper:
https://github.com/kaixindelele/ChatPaper
# Edge-GPT:
https://github.com/acheong08/EdgeGPT
# ChuanhuChatGPT:
https://github.com/GaiZhenbiao/ChuanhuChatGPT
# 借鉴项目2清华ChatGLM-6B
https://github.com/THUDM/ChatGLM-6B
# Oobabooga one-click installer:
https://github.com/oobabooga/one-click-installers
# More
https://github.com/gradio-app/gradio
https://github.com/fghrsh/live2d_demo
```

查看文件

@@ -3,15 +3,20 @@ def check_proxy(proxies):
import requests
proxies_https = proxies['https'] if proxies is not None else ''
try:
response = requests.get("https://ipapi.co/json/",
proxies=proxies, timeout=4)
response = requests.get("https://ipapi.co/json/", proxies=proxies, timeout=4)
data = response.json()
print(f'查询代理的地理位置,返回的结果是{data}')
# print(f'查询代理的地理位置,返回的结果是{data}')
if 'country_name' in data:
country = data['country_name']
result = f"代理配置 {proxies_https}, 代理所在地:{country}"
elif 'error' in data:
result = f"代理配置 {proxies_https}, 代理所在地未知,IP查询频率受限"
alternative = _check_with_backup_source(proxies)
if alternative is None:
result = f"代理配置 {proxies_https}, 代理所在地未知,IP查询频率受限"
else:
result = f"代理配置 {proxies_https}, 代理所在地:{alternative}"
else:
result = f"代理配置 {proxies_https}, 代理数据解析失败:{data}"
print(result)
return result
except:
@@ -19,6 +24,11 @@ def check_proxy(proxies):
print(result)
return result
def _check_with_backup_source(proxies):
import random, string, requests
random_string = ''.join(random.choices(string.ascii_letters + string.digits, k=32))
try: return requests.get(f"http://{random_string}.edns.ip-api.com/json", proxies=proxies, timeout=4).json()['dns']['geo']
except: return None
def backup_and_download(current_version, remote_version):
"""
@@ -94,7 +104,7 @@ def get_current_version():
return current_version
def auto_update():
def auto_update(raise_error=False):
"""
一键更新协议:查询版本和用户意见
"""
@@ -115,7 +125,7 @@ def auto_update():
with open('./version', 'r', encoding='utf8') as f:
current_version = f.read()
current_version = json.loads(current_version)['version']
if (remote_version - current_version) >= 0.01:
if (remote_version - current_version) >= 0.01-1e-5:
from colorful import print亮黄
print亮黄(
f'\n新版本可用。新版本:{remote_version},当前版本:{current_version}{new_feature}')
@@ -126,22 +136,32 @@ def auto_update():
try:
patch_and_restart(path)
except:
print('更新失败。')
msg = '更新失败。'
if raise_error:
from toolbox import trimmed_format_exc
msg += trimmed_format_exc()
print(msg)
else:
print('自动更新程序:已禁用')
return
else:
return
except:
print('自动更新程序:已禁用')
msg = '自动更新程序:已禁用。建议排查:代理网络配置。'
if raise_error:
from toolbox import trimmed_format_exc
msg += trimmed_format_exc()
print(msg)
def warm_up_modules():
print('正在执行一些模块的预热...')
from toolbox import ProxyNetworkActivate
from request_llm.bridge_all import model_info
enc = model_info["gpt-3.5-turbo"]['tokenizer']
enc.encode("模块预热", disallowed_special=())
enc = model_info["gpt-4"]['tokenizer']
enc.encode("模块预热", disallowed_special=())
with ProxyNetworkActivate("Warmup_Modules"):
enc = model_info["gpt-3.5-turbo"]['tokenizer']
enc.encode("模块预热", disallowed_special=())
enc = model_info["gpt-4"]['tokenizer']
enc.encode("模块预热", disallowed_special=())
if __name__ == '__main__':
import os

查看文件

@@ -34,58 +34,28 @@ def print亮紫(*kw,**kargs):
def print亮靛(*kw,**kargs):
print("\033[1;36m",*kw,"\033[0m",**kargs)
def print亮红(*kw,**kargs):
print("\033[1;31m",*kw,"\033[0m",**kargs)
def print亮绿(*kw,**kargs):
print("\033[1;32m",*kw,"\033[0m",**kargs)
def print亮黄(*kw,**kargs):
print("\033[1;33m",*kw,"\033[0m",**kargs)
def print亮蓝(*kw,**kargs):
print("\033[1;34m",*kw,"\033[0m",**kargs)
def print亮紫(*kw,**kargs):
print("\033[1;35m",*kw,"\033[0m",**kargs)
def print亮靛(*kw,**kargs):
print("\033[1;36m",*kw,"\033[0m",**kargs)
print_red = print红
print_green = print绿
print_yellow = print黄
print_blue = print蓝
print_purple = print紫
print_indigo = print靛
print_bold_red = print亮红
print_bold_green = print亮绿
print_bold_yellow = print亮黄
print_bold_blue = print亮蓝
print_bold_purple = print亮紫
print_bold_indigo = print亮靛
if not stdout.isatty():
# redirection, avoid a fucked up log file
print红 = print
print绿 = print
print黄 = print
print蓝 = print
print紫 = print
print靛 = print
print亮红 = print
print亮绿 = print
print亮黄 = print
print亮蓝 = print
print亮紫 = print
print亮靛 = print
print_red = print
print_green = print
print_yellow = print
print_blue = print
print_purple = print
print_indigo = print
print_bold_red = print
print_bold_green = print
print_bold_yellow = print
print_bold_blue = print
print_bold_purple = print
print_bold_indigo = print
# Do you like the elegance of Chinese characters?
def sprint红(*kw):
return "\033[0;31m"+' '.join(kw)+"\033[0m"
def sprint绿(*kw):
return "\033[0;32m"+' '.join(kw)+"\033[0m"
def sprint(*kw):
return "\033[0;33m"+' '.join(kw)+"\033[0m"
def sprint(*kw):
return "\033[0;34m"+' '.join(kw)+"\033[0m"
def sprint(*kw):
return "\033[0;35m"+' '.join(kw)+"\033[0m"
def sprint(*kw):
return "\033[0;36m"+' '.join(kw)+"\033[0m"
def sprint亮红(*kw):
return "\033[1;31m"+' '.join(kw)+"\033[0m"
def sprint亮绿(*kw):
return "\033[1;32m"+' '.join(kw)+"\033[0m"
def sprint亮黄(*kw):
return "\033[1;33m"+' '.join(kw)+"\033[0m"
def sprint亮蓝(*kw):
return "\033[1;34m"+' '.join(kw)+"\033[0m"
def sprint亮紫(*kw):
return "\033[1;35m"+' '.join(kw)+"\033[0m"
def sprint亮靛(*kw):
return "\033[1;36m"+' '.join(kw)+"\033[0m"

238
config.py
查看文件

@@ -1,16 +1,27 @@
# [step 1]>> 例如: API_KEY = "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" 此key无效
API_KEY = "sk-此处填API密钥" # 可同时填写多个API-KEY,用英文逗号分割,例如API_KEY = "sk-openaikey1,sk-openaikey2,fkxxxx-api2dkey1,fkxxxx-api2dkey2"
"""
以下所有配置也都支持利用环境变量覆写,环境变量配置格式见docker-compose.yml。
读取优先级:环境变量 > config_private.py > config.py
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
All the following configurations also support using environment variables to override,
and the environment variable configuration format can be seen in docker-compose.yml.
Configuration reading priority: environment variable > config_private.py > config.py
"""
# [step 2]>> 改为True应用代理,如果直接在海外服务器部署,此处不修改
# [step 1]>> API_KEY = "sk-123456789xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx123456789"。极少数情况下,还需要填写组织格式如org-123456789abcdefghijklmno的,请向下翻,找 API_ORG 设置项
API_KEY = "此处填API密钥" # 可同时填写多个API-KEY,用英文逗号分割,例如API_KEY = "sk-openaikey1,sk-openaikey2,fkxxxx-api2dkey3,azure-apikey4"
# [step 2]>> 改为True应用代理,如果直接在海外服务器部署,此处不修改;如果使用本地或无地域限制的大模型时,此处也不需要修改
USE_PROXY = False
if USE_PROXY:
# 填写格式是 [协议]:// [地址] :[端口],填写之前不要忘记把USE_PROXY改成True,如果直接在海外服务器部署,此处不修改
# 例如 "socks5h://localhost:11284"
# [协议] 常见协议无非socks5h/http; 例如 v2**y 和 ss* 的默认本地协议是socks5h; 而cl**h 的默认本地协议是http
# [地址] 懂的都懂,不懂就填localhost或者127.0.0.1肯定错不了localhost意思是代理软件安装在本机上
# [端口] 在代理软件的设置里找。虽然不同的代理软件界面不一样,但端口号都应该在最显眼的位置上
# 代理网络的地址,打开你的*学*网软件查看代理的协议(socks5/http)、地址(localhost)和端口(11284)
"""
填写格式是 [协议]:// [地址] :[端口],填写之前不要忘记把USE_PROXY改成True,如果直接在海外服务器部署,此处不修改
<配置教程&视频教程> https://github.com/binary-husky/gpt_academic/issues/1>
[协议] 常见协议无非socks5h/http; 例如 v2**y 和 ss* 的默认本地协议是socks5h; 而cl**h 的默认本地协议是http
[地址] 懂的都懂,不懂就填localhost或者127.0.0.1肯定错不了localhost意思是代理软件安装在本机上
[端口] 在代理软件的设置里找。虽然不同的代理软件界面不一样,但端口号都应该在最显眼的位置上
"""
# 代理网络的地址,打开你的*学*网软件查看代理的协议(socks5h / http)、地址(localhost)和端口(11284)
proxies = {
# [协议]:// [地址] :[端口]
"http": "socks5h://localhost:11284", # 再例如 "http": "http://127.0.0.1:7890",
@@ -19,59 +30,234 @@ if USE_PROXY:
else:
proxies = None
# [step 3]>> 多线程函数插件中,默认允许多少路线程同时访问OpenAI。Free trial users的限制是每分钟3次,Pay-as-you-go users的限制是每分钟3500次
# 一言以蔽之免费用户填3,OpenAI绑了信用卡的用户可以填 16 或者更高。提高限制请查询https://platform.openai.com/docs/guides/rate-limits/overview
# ------------------------------------ 以下配置可以优化体验, 但大部分场合下并不需要修改 ------------------------------------
# 重新URL重新定向,实现更换API_URL的作用高危设置! 常规情况下不要修改! 通过修改此设置,您将把您的API-KEY和对话隐私完全暴露给您设定的中间人
# 格式: API_URL_REDIRECT = {"https://api.openai.com/v1/chat/completions": "在这里填写重定向的api.openai.com的URL"}
# 举例: API_URL_REDIRECT = {"https://api.openai.com/v1/chat/completions": "https://reverse-proxy-url/v1/chat/completions"}
API_URL_REDIRECT = {}
# 多线程函数插件中,默认允许多少路线程同时访问OpenAI。Free trial users的限制是每分钟3次,Pay-as-you-go users的限制是每分钟3500次
# 一言以蔽之免费5刀用户填3,OpenAI绑了信用卡的用户可以填 16 或者更高。提高限制请查询https://platform.openai.com/docs/guides/rate-limits/overview
DEFAULT_WORKER_NUM = 3
# [step 4]>> 以下配置可以优化体验,但大部分场合下并不需要修改
# 对话窗的高度
# 色彩主题, 可选 ["Default", "Chuanhu-Small-and-Beautiful", "High-Contrast"]
# 更多主题, 请查阅Gradio主题商店: https://huggingface.co/spaces/gradio/theme-gallery 可选 ["Gstaff/Xkcd", "NoCrypt/Miku", ...]
THEME = "Default"
AVAIL_THEMES = ["Default", "Chuanhu-Small-and-Beautiful", "High-Contrast", "Gstaff/Xkcd", "NoCrypt/Miku"]
# 对话窗的高度 仅在LAYOUT="TOP-DOWN"时生效)
CHATBOT_HEIGHT = 1115
# 代码高亮
CODE_HIGHLIGHT = True
# 窗口布局
LAYOUT = "LEFT-RIGHT" # "LEFT-RIGHT"(左右布局) # "TOP-DOWN"(上下布局)
DARK_MODE = True # "LEFT-RIGHT"(左右布局) # "TOP-DOWN"(上下布局)
LAYOUT = "LEFT-RIGHT" # "LEFT-RIGHT"(左右布局) # "TOP-DOWN"(上下布局)
# 暗色模式 / 亮色模式
DARK_MODE = True
# 发送请求到OpenAI后,等待多久判定为超时
TIMEOUT_SECONDS = 30
# 网页的端口, -1代表随机端口
WEB_PORT = -1
# 如果OpenAI不响应网络卡顿、代理失败、KEY失效,重试的次数限制
MAX_RETRY = 2
# OpenAI模型选择是gpt4现在只对申请成功的人开放,体验gpt-4可以试试api2d
# 插件分类默认选项
DEFAULT_FN_GROUPS = ['对话', '编程', '学术', '智能体']
# 模型选择是 (注意: LLM_MODEL是默认选中的模型, 它*必须*被包含在AVAIL_LLM_MODELS列表中 )
LLM_MODEL = "gpt-3.5-turbo" # 可选 ↓↓↓
AVAIL_LLM_MODELS = ["gpt-3.5-turbo", "api2d-gpt-3.5-turbo", "gpt-4", "api2d-gpt-4", "chatglm", "newbing"]
AVAIL_LLM_MODELS = ["gpt-3.5-turbo-16k", "gpt-3.5-turbo", "azure-gpt-3.5", "api2d-gpt-3.5-turbo",
"gpt-4", "gpt-4-32k", "azure-gpt-4", "api2d-gpt-4", "chatglm", "moss", "newbing", "stack-claude"]
# P.S. 其他可用的模型还包括 ["qianfan", "llama2", "qwen", "gpt-3.5-turbo-0613", "gpt-3.5-turbo-16k-0613",
# "spark", "sparkv2", "chatglm_onnx", "claude-1-100k", "claude-2", "internlm", "jittorllms_pangualpha", "jittorllms_llama"]
# 百度千帆LLM_MODEL="qianfan"
BAIDU_CLOUD_API_KEY = ''
BAIDU_CLOUD_SECRET_KEY = ''
BAIDU_CLOUD_QIANFAN_MODEL = 'ERNIE-Bot' # 可选 "ERNIE-Bot"(文心一言), "ERNIE-Bot-turbo", "BLOOMZ-7B", "Llama-2-70B-Chat", "Llama-2-13B-Chat", "Llama-2-7B-Chat"
# 如果使用ChatGLM2微调模型,请把 LLM_MODEL="chatglmft",并在此处指定模型路径
CHATGLM_PTUNING_CHECKPOINT = "" # 例如"/home/hmp/ChatGLM2-6B/ptuning/output/6b-pt-128-1e-2/checkpoint-100"
# 本地LLM模型如ChatGLM的执行方式 CPU/GPU
LOCAL_MODEL_DEVICE = "cpu" # 可选 "cuda"
LOCAL_MODEL_QUANT = "FP16" # 默认 "FP16" "INT4" 启用量化INT4版本 "INT8" 启用量化INT8版本
# 设置gradio的并行线程数不需要修改
CONCURRENT_COUNT = 100
# 加一个看板娘装饰
# 是否在提交时自动清空输入框
AUTO_CLEAR_TXT = False
# 加一个live2d装饰
ADD_WAIFU = False
# 设置用户名和密码不需要修改相关功能不稳定,与gradio版本和网络都相关,如果本地使用不建议加这个
# [("username", "password"), ("username2", "password2"), ...]
AUTHENTICATION = []
# 重新URL重新定向,实现更换API_URL的作用常规情况下,不要修改!!
# 高危设置通过修改此设置,您将把您的API-KEY和对话隐私完全暴露给您设定的中间人
# 格式 {"https://api.openai.com/v1/chat/completions": "在这里填写重定向的api.openai.com的URL"}
# 例如 API_URL_REDIRECT = {"https://api.openai.com/v1/chat/completions": "https://ai.open.com/api/conversation"}
API_URL_REDIRECT = {}
# 如果需要在二级路径下运行(常规情况下,不要修改!!需要配合修改main.py才能生效!
CUSTOM_PATH = "/"
# 如果需要使用newbing,把newbing的长长的cookie放到这里
# 极少数情况下,openai的官方KEY需要伴随组织编码格式如org-xxxxxxxxxxxxxxxxxxxxxxxx使用
API_ORG = ""
# 如果需要使用Slack Claude,使用教程详情见 request_llm/README.md
SLACK_CLAUDE_BOT_ID = ''
SLACK_CLAUDE_USER_TOKEN = ''
# 如果需要使用AZURE 详情请见额外文档 docs\use_azure.md
AZURE_ENDPOINT = "https://你亲手写的api名称.openai.azure.com/"
AZURE_API_KEY = "填入azure openai api的密钥" # 建议直接在API_KEY处填写,该选项即将被弃用
AZURE_ENGINE = "填入你亲手写的部署名" # 读 docs\use_azure.md
# 使用Newbing
NEWBING_STYLE = "creative" # ["creative", "balanced", "precise"]
NEWBING_COOKIES = """
your bing cookies here
put your new bing cookies here
"""
# 阿里云实时语音识别 配置难度较高 仅建议高手用户使用 参考 https://github.com/binary-husky/gpt_academic/blob/master/docs/use_audio.md
ENABLE_AUDIO = False
ALIYUN_TOKEN="" # 例如 f37f30e0f9934c34a992f6f64f7eba4f
ALIYUN_APPKEY="" # 例如 RoPlZrM88DnAFkZK
ALIYUN_ACCESSKEY="" # (无需填写)
ALIYUN_SECRET="" # (无需填写)
# 接入讯飞星火大模型 https://console.xfyun.cn/services/iat
XFYUN_APPID = "00000000"
XFYUN_API_SECRET = "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
XFYUN_API_KEY = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
# Claude API KEY
ANTHROPIC_API_KEY = ""
# 自定义API KEY格式
CUSTOM_API_KEY_PATTERN = ""
# HUGGINGFACE的TOKEN,下载LLAMA时起作用 https://huggingface.co/docs/hub/security-tokens
HUGGINGFACE_ACCESS_TOKEN = "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"
# GROBID服务器地址填写多个可以均衡负载,用于高质量地读取PDF文档
# 获取方法复制以下空间https://huggingface.co/spaces/qingxu98/grobid,设为public,然后GROBID_URL = "https://(你的hf用户名如qingxu98)-(你的填写的空间名如grobid).hf.space"
GROBID_URLS = [
"https://qingxu98-grobid.hf.space","https://qingxu98-grobid2.hf.space","https://qingxu98-grobid3.hf.space",
"https://shaocongma-grobid.hf.space","https://FBR123-grobid.hf.space", "https://yeku-grobid.hf.space",
]
# 是否允许通过自然语言描述修改本页的配置,该功能具有一定的危险性,默认关闭
ALLOW_RESET_CONFIG = False
# 临时的上传文件夹位置,请勿修改
PATH_PRIVATE_UPLOAD = "private_upload"
# 日志文件夹的位置,请勿修改
PATH_LOGGING = "gpt_log"
# 除了连接OpenAI之外,还有哪些场合允许使用代理,请勿修改
WHEN_TO_USE_PROXY = ["Download_LLM", "Download_Gradio_Theme", "Connect_Grobid", "Warmup_Modules"]
"""
在线大模型配置关联关系示意图
├── "gpt-3.5-turbo" 等openai模型
│ ├── API_KEY
│ ├── CUSTOM_API_KEY_PATTERN不常用
│ ├── API_ORG不常用
│ └── API_URL_REDIRECT不常用
├── "azure-gpt-3.5" 等azure模型
│ ├── API_KEY
│ ├── AZURE_ENDPOINT
│ ├── AZURE_API_KEY
│ ├── AZURE_ENGINE
│ └── API_URL_REDIRECT
├── "spark" 星火认知大模型 spark & sparkv2
│ ├── XFYUN_APPID
│ ├── XFYUN_API_SECRET
│ └── XFYUN_API_KEY
├── "claude-1-100k" 等claude模型
│ └── ANTHROPIC_API_KEY
├── "stack-claude"
│ ├── SLACK_CLAUDE_BOT_ID
│ └── SLACK_CLAUDE_USER_TOKEN
├── "qianfan" 百度千帆大模型库
│ ├── BAIDU_CLOUD_QIANFAN_MODEL
│ ├── BAIDU_CLOUD_API_KEY
│ └── BAIDU_CLOUD_SECRET_KEY
├── "newbing" Newbing接口不再稳定,不推荐使用
├── NEWBING_STYLE
└── NEWBING_COOKIES
用户图形界面布局依赖关系示意图
├── CHATBOT_HEIGHT 对话窗的高度
├── CODE_HIGHLIGHT 代码高亮
├── LAYOUT 窗口布局
├── DARK_MODE 暗色模式 / 亮色模式
├── DEFAULT_FN_GROUPS 插件分类默认选项
├── THEME 色彩主题
├── AUTO_CLEAR_TXT 是否在提交时自动清空输入框
├── ADD_WAIFU 加一个live2d装饰
├── ALLOW_RESET_CONFIG 是否允许通过自然语言描述修改本页的配置,该功能具有一定的危险性
插件在线服务配置依赖关系示意图
├── 语音功能
│ ├── ENABLE_AUDIO
│ ├── ALIYUN_TOKEN
│ ├── ALIYUN_APPKEY
│ ├── ALIYUN_ACCESSKEY
│ └── ALIYUN_SECRET
├── PDF文档精准解析
│ └── GROBID_URLS
"""

查看文件

@@ -1,20 +1,26 @@
# 'primary' 颜色对应 theme.py 中的 primary_hue
# 'secondary' 颜色对应 theme.py 中的 neutral_hue
# 'stop' 颜色对应 theme.py 中的 color_er
# 默认按钮颜色是 secondary
import importlib
from toolbox import clear_line_break
def get_core_functions():
return {
"英语学术润色": {
# 前
# 前缀,会被加在你的输入之前。例如,用来描述你的要求,例如翻译、解释代码、润色等等
"Prefix": r"Below is a paragraph from an academic paper. Polish the writing to meet the academic style, " +
r"improve the spelling, grammar, clarity, concision and overall readability. When necessary, rewrite the whole sentence. " +
r"Furthermore, list all modification and explain the reasons to do so in markdown table." + "\n\n",
# 后语
r"Firstly, you should provide the polished paragraph. "
r"Secondly, you should list all your modification and explain the reasons to do so in markdown table." + "\n\n",
# 后缀,会被加在你的输入之后。例如,配合前缀可以把你的输入内容用引号圈起来
"Suffix": r"",
"Color": r"secondary", # 按钮颜色
# 按钮颜色 (默认 secondary)
"Color": r"secondary",
# 按钮是否可见 (默认 True,即可见)
"Visible": True,
# 是否在触发时清除历史 (默认 False,即不处理之前的对话历史)
"AutoClearHistory": False
},
"中文学术润色": {
"Prefix": r"作为一名中文学术论文写作改进助理,你的任务是改进所提供文本的拼写、语法、清晰、简洁和整体可读性," +
@@ -22,17 +28,18 @@ def get_core_functions():
"Suffix": r"",
},
"查找语法错误": {
"Prefix": r"Can you help me ensure that the grammar and the spelling is correct? " +
r"Do not try to polish the text, if no mistake is found, tell me that this paragraph is good." +
r"If you find grammar or spelling mistakes, please list mistakes you find in a two-column markdown table, " +
r"put the original text the first column, " +
r"put the corrected text in the second column and highlight the key words you fixed.""\n"
"Prefix": r"Help me ensure that the grammar and the spelling is correct. "
r"Do not try to polish the text, if no mistake is found, tell me that this paragraph is good. "
r"If you find grammar or spelling mistakes, please list mistakes you find in a two-column markdown table, "
r"put the original text the first column, "
r"put the corrected text in the second column and highlight the key words you fixed. "
r"Finally, please provide the proofreaded text.""\n\n"
r"Example:""\n"
r"Paragraph: How is you? Do you knows what is it?""\n"
r"| Original sentence | Corrected sentence |""\n"
r"| :--- | :--- |""\n"
r"| How **is** you? | How **are** you? |""\n"
r"| Do you **knows** what **is** **it**? | Do you **know** what **it** **is** ? |""\n"
r"| Do you **knows** what **is** **it**? | Do you **know** what **it** **is** ? |""\n\n"
r"Below is a paragraph from an academic paper. "
r"You need to report all grammar and spelling mistakes as the example before."
+ "\n\n",
@@ -58,14 +65,34 @@ def get_core_functions():
"英译中": {
"Prefix": r"翻译成地道的中文:" + "\n\n",
"Suffix": r"",
"Visible": False,
},
"找图片": {
"Prefix": r"我需要你找一张网络图片。使用Unsplash API(https://source.unsplash.com/960x640/?<英语关键词>)获取图片URL," +
r"然后请使用Markdown格式封装,并且不要有反斜线,不要用代码块。现在,请按以下描述给我发送图片" + "\n\n",
"Suffix": r"",
"Visible": False,
},
"解释代码": {
"Prefix": r"请解释以下代码:" + "\n```\n",
"Suffix": "\n```\n",
},
"参考文献转Bib": {
"Prefix": r"Here are some bibliography items, please transform them into bibtex style." +
r"Note that, reference styles maybe more than one kind, you should transform each item correctly." +
r"Items need to be transformed:",
"Visible": False,
"Suffix": r"",
}
}
def handle_core_functionality(additional_fn, inputs, history, chatbot):
import core_functional
importlib.reload(core_functional) # 热更新prompt
core_functional = core_functional.get_core_functions()
if "PreProcess" in core_functional[additional_fn]: inputs = core_functional[additional_fn]["PreProcess"](inputs) # 获取预处理函数(如果有的话)
inputs = core_functional[additional_fn]["Prefix"] + inputs + core_functional[additional_fn]["Suffix"]
if core_functional[additional_fn].get("AutoClearHistory", False):
history = []
return inputs, history

查看文件

@@ -2,18 +2,18 @@ from toolbox import HotReload # HotReload 的意思是热更新,修改函数
def get_crazy_functions():
###################### 第一组插件 ###########################
from crazy_functions.读文章写摘要 import 读文章写摘要
from crazy_functions.生成函数注释 import 批量生成函数注释
from crazy_functions.解析项目源代码 import 解析项目本身
from crazy_functions.解析项目源代码 import 解析一个Python项目
from crazy_functions.解析项目源代码 import 解析一个Matlab项目
from crazy_functions.解析项目源代码 import 解析一个C项目的头文件
from crazy_functions.解析项目源代码 import 解析一个C项目
from crazy_functions.解析项目源代码 import 解析一个Golang项目
from crazy_functions.解析项目源代码 import 解析一个Rust项目
from crazy_functions.解析项目源代码 import 解析一个Java项目
from crazy_functions.解析项目源代码 import 解析一个前端项目
from crazy_functions.高级功能函数模板 import 高阶功能模板函数
from crazy_functions.代码重写为全英文_多线程 import 全项目切换英文
from crazy_functions.Latex全文润色 import Latex英文润色
from crazy_functions.询问多个大语言模型 import 同时问询
from crazy_functions.解析项目源代码 import 解析一个Lua项目
@@ -23,218 +23,564 @@ def get_crazy_functions():
from crazy_functions.对话历史存档 import 对话历史存档
from crazy_functions.对话历史存档 import 载入对话历史存档
from crazy_functions.对话历史存档 import 删除所有本地对话历史记录
from crazy_functions.辅助功能 import 清除缓存
from crazy_functions.批量Markdown翻译 import Markdown英译中
function_plugins = {
"解析整个Python项目": {
"Color": "stop", # 按钮颜色
"Function": HotReload(解析一个Python项目)
},
"载入对话历史存档(先上传存档或输入路径)": {
"Color": "stop",
"AsButton":False,
"Function": HotReload(载入对话历史存档)
},
"删除所有本地对话历史记录(请谨慎操作)": {
"AsButton":False,
"Function": HotReload(删除所有本地对话历史记录)
},
"[测试功能] 解析Jupyter Notebook文件": {
"Color": "stop",
"AsButton":False,
"Function": HotReload(解析ipynb文件),
"AdvancedArgs": True, # 调用时,唤起高级参数输入区默认False
"ArgsReminder": "若输入0,则不解析notebook中的Markdown块", # 高级参数输入区的显示提示
},
"批量总结Word文档": {
"Color": "stop",
"Function": HotReload(总结word文档)
},
"解析整个C++项目头文件": {
"Color": "stop", # 按钮颜色
"AsButton": False, # 加入下拉菜单中
"Function": HotReload(解析一个C项目的头文件)
},
"解析整个C++项目(.cpp/.hpp/.c/.h": {
"Color": "stop", # 按钮颜色
"AsButton": False, # 加入下拉菜单中
"Function": HotReload(解析一个C项目)
},
"解析整个Go项目": {
"Color": "stop", # 按钮颜色
"AsButton": False, # 加入下拉菜单中
"Function": HotReload(解析一个Golang项目)
},
"解析整个Java项目": {
"Color": "stop", # 按钮颜色
"AsButton": False, # 加入下拉菜单中
"Function": HotReload(解析一个Java项目)
},
"解析整个前端项目js,ts,css等": {
"Color": "stop", # 按钮颜色
"AsButton": False, # 加入下拉菜单中
"Function": HotReload(解析一个前端项目)
},
"解析整个Lua项目": {
"Color": "stop", # 按钮颜色
"AsButton": False, # 加入下拉菜单中
"Function": HotReload(解析一个Lua项目)
},
"解析整个CSharp项目": {
"Color": "stop", # 按钮颜色
"AsButton": False, # 加入下拉菜单中
"Function": HotReload(解析一个CSharp项目)
},
"读Tex论文写摘要": {
"Color": "stop", # 按钮颜色
"Function": HotReload(读文章写摘要)
},
"Markdown/Readme英译中": {
# HotReload 的意思是热更新,修改函数插件代码后,不需要重启程序,代码直接生效
"Color": "stop",
"Function": HotReload(Markdown英译中)
},
"批量生成函数注释": {
"Color": "stop", # 按钮颜色
"AsButton": False, # 加入下拉菜单中
"Function": HotReload(批量生成函数注释)
},
"保存当前的对话": {
"Function": HotReload(对话历史存档)
},
"[多线程Demo] 解析此项目本身(源码自译解)": {
"AsButton": False, # 加入下拉菜单中
"Function": HotReload(解析项目本身)
},
"[老旧的Demo] 把本项目源代码切换成全英文": {
# HotReload 的意思是热更新,修改函数插件代码后,不需要重启程序,代码直接生效
"AsButton": False, # 加入下拉菜单中
"Function": HotReload(全项目切换英文)
},
"[插件demo] 历史上的今天": {
# HotReload 的意思是热更新,修改函数插件代码后,不需要重启程序,代码直接生效
"Function": HotReload(高阶功能模板函数)
},
}
###################### 第二组插件 ###########################
# [第二组插件]: 经过充分测试
from crazy_functions.批量总结PDF文档 import 批量总结PDF文档
from crazy_functions.批量总结PDF文档pdfminer import 批量总结PDF文档pdfminer
from crazy_functions.批量翻译PDF文档_多线程 import 批量翻译PDF文档
from crazy_functions.谷歌检索小助手 import 谷歌检索小助手
from crazy_functions.理解PDF文档内容 import 理解PDF文档内容标准文件输入
from crazy_functions.Latex全文润色 import Latex中文润色
from crazy_functions.Latex全文润色 import Latex英文纠错
from crazy_functions.Latex全文翻译 import Latex中译英
from crazy_functions.Latex全文翻译 import Latex英译中
from crazy_functions.批量Markdown翻译 import Markdown中译英
from crazy_functions.虚空终端 import 虚空终端
function_plugins.update({
"批量翻译PDF文档多线程": {
function_plugins = {
"虚空终端": {
"Group": "对话|编程|学术|智能体",
"Color": "stop",
"AsButton": True, # 加入下拉菜单中
"AsButton": True,
"Function": HotReload(虚空终端)
},
"解析整个Python项目": {
"Group": "编程",
"Color": "stop",
"AsButton": True,
"Info": "解析一个Python项目的所有源文件(.py) | 输入参数为路径",
"Function": HotReload(解析一个Python项目)
},
"载入对话历史存档(先上传存档或输入路径)": {
"Group": "对话",
"Color": "stop",
"AsButton": False,
"Info": "载入对话历史存档 | 输入参数为路径",
"Function": HotReload(载入对话历史存档)
},
"删除所有本地对话历史记录(谨慎操作)": {
"Group": "对话",
"AsButton": False,
"Info": "删除所有本地对话历史记录,谨慎操作 | 不需要输入参数",
"Function": HotReload(删除所有本地对话历史记录)
},
"清除所有缓存文件(谨慎操作)": {
"Group": "对话",
"Color": "stop",
"AsButton": False, # 加入下拉菜单中
"Info": "清除所有缓存文件,谨慎操作 | 不需要输入参数",
"Function": HotReload(清除缓存)
},
"批量总结Word文档": {
"Group": "学术",
"Color": "stop",
"AsButton": True,
"Info": "批量总结word文档 | 输入参数为路径",
"Function": HotReload(总结word文档)
},
"解析整个Matlab项目": {
"Group": "编程",
"Color": "stop",
"AsButton": False,
"Info": "解析一个Matlab项目的所有源文件(.m) | 输入参数为路径",
"Function": HotReload(解析一个Matlab项目)
},
"解析整个C++项目头文件": {
"Group": "编程",
"Color": "stop",
"AsButton": False, # 加入下拉菜单中
"Info": "解析一个C++项目的所有头文件(.h/.hpp) | 输入参数为路径",
"Function": HotReload(解析一个C项目的头文件)
},
"解析整个C++项目(.cpp/.hpp/.c/.h": {
"Group": "编程",
"Color": "stop",
"AsButton": False, # 加入下拉菜单中
"Info": "解析一个C++项目的所有源文件(.cpp/.hpp/.c/.h| 输入参数为路径",
"Function": HotReload(解析一个C项目)
},
"解析整个Go项目": {
"Group": "编程",
"Color": "stop",
"AsButton": False, # 加入下拉菜单中
"Info": "解析一个Go项目的所有源文件 | 输入参数为路径",
"Function": HotReload(解析一个Golang项目)
},
"解析整个Rust项目": {
"Group": "编程",
"Color": "stop",
"AsButton": False, # 加入下拉菜单中
"Info": "解析一个Rust项目的所有源文件 | 输入参数为路径",
"Function": HotReload(解析一个Rust项目)
},
"解析整个Java项目": {
"Group": "编程",
"Color": "stop",
"AsButton": False, # 加入下拉菜单中
"Info": "解析一个Java项目的所有源文件 | 输入参数为路径",
"Function": HotReload(解析一个Java项目)
},
"解析整个前端项目js,ts,css等": {
"Group": "编程",
"Color": "stop",
"AsButton": False, # 加入下拉菜单中
"Info": "解析一个前端项目的所有源文件js,ts,css等 | 输入参数为路径",
"Function": HotReload(解析一个前端项目)
},
"解析整个Lua项目": {
"Group": "编程",
"Color": "stop",
"AsButton": False, # 加入下拉菜单中
"Info": "解析一个Lua项目的所有源文件 | 输入参数为路径",
"Function": HotReload(解析一个Lua项目)
},
"解析整个CSharp项目": {
"Group": "编程",
"Color": "stop",
"AsButton": False, # 加入下拉菜单中
"Info": "解析一个CSharp项目的所有源文件 | 输入参数为路径",
"Function": HotReload(解析一个CSharp项目)
},
"解析Jupyter Notebook文件": {
"Group": "编程",
"Color": "stop",
"AsButton": False,
"Info": "解析Jupyter Notebook文件 | 输入参数为路径",
"Function": HotReload(解析ipynb文件),
"AdvancedArgs": True, # 调用时,唤起高级参数输入区默认False
"ArgsReminder": "若输入0,则不解析notebook中的Markdown块", # 高级参数输入区的显示提示
},
"读Tex论文写摘要": {
"Group": "学术",
"Color": "stop",
"AsButton": False,
"Info": "读取Tex论文并写摘要 | 输入参数为路径",
"Function": HotReload(读文章写摘要)
},
"翻译README或MD": {
"Group": "编程",
"Color": "stop",
"AsButton": True,
"Info": "将Markdown翻译为中文 | 输入参数为路径或URL",
"Function": HotReload(Markdown英译中)
},
"翻译Markdown或README支持Github链接": {
"Group": "编程",
"Color": "stop",
"AsButton": False,
"Info": "将Markdown或README翻译为中文 | 输入参数为路径或URL",
"Function": HotReload(Markdown英译中)
},
"批量生成函数注释": {
"Group": "编程",
"Color": "stop",
"AsButton": False, # 加入下拉菜单中
"Info": "批量生成函数的注释 | 输入参数为路径",
"Function": HotReload(批量生成函数注释)
},
"保存当前的对话": {
"Group": "对话",
"AsButton": True,
"Info": "保存当前的对话 | 不需要输入参数",
"Function": HotReload(对话历史存档)
},
"[多线程Demo]解析此项目本身(源码自译解)": {
"Group": "对话|编程",
"AsButton": False, # 加入下拉菜单中
"Info": "多线程解析并翻译此项目的源码 | 不需要输入参数",
"Function": HotReload(解析项目本身)
},
"[插件demo]历史上的今天": {
"Group": "对话",
"AsButton": True,
"Info": "查看历史上的今天事件 | 不需要输入参数",
"Function": HotReload(高阶功能模板函数)
},
"精准翻译PDF论文": {
"Group": "学术",
"Color": "stop",
"AsButton": True,
"Info": "精准翻译PDF论文为中文 | 输入参数为路径",
"Function": HotReload(批量翻译PDF文档)
},
"询问多个GPT模型": {
"Color": "stop", # 按钮颜色
"Group": "对话",
"Color": "stop",
"AsButton": True,
"Function": HotReload(同时问询)
},
"[测试功能] 批量总结PDF文档": {
"批量总结PDF文档": {
"Group": "学术",
"Color": "stop",
"AsButton": False, # 加入下拉菜单中
# HotReload 的意思是热更新,修改函数插件代码后,不需要重启程序,代码直接生效
"Info": "批量总结PDF文档的内容 | 输入参数为路径",
"Function": HotReload(批量总结PDF文档)
},
"[测试功能] 批量总结PDF文档pdfminer": {
"Color": "stop",
"AsButton": False, # 加入下拉菜单中
"Function": HotReload(批量总结PDF文档pdfminer)
},
"谷歌学术检索助手输入谷歌学术搜索页url": {
"Group": "学术",
"Color": "stop",
"AsButton": False, # 加入下拉菜单中
"Info": "使用谷歌学术检索助手搜索指定URL的结果 | 输入参数为谷歌学术搜索页的URL",
"Function": HotReload(谷歌检索小助手)
},
"理解PDF文档内容 模仿ChatPDF": {
# HotReload 的意思是热更新,修改函数插件代码后,不需要重启程序,代码直接生效
"Group": "学术",
"Color": "stop",
"AsButton": False, # 加入下拉菜单中
"Info": "理解PDF文档的内容并进行回答 | 输入参数为路径",
"Function": HotReload(理解PDF文档内容标准文件输入)
},
"[测试功能] 英文Latex项目全文润色输入路径或上传压缩包": {
# HotReload 的意思是热更新,修改函数插件代码后,不需要重启程序,代码直接生效
"英文Latex项目全文润色输入路径或上传压缩包": {
"Group": "学术",
"Color": "stop",
"AsButton": False, # 加入下拉菜单中
"Info": "对英文Latex项目全文进行润色处理 | 输入参数为路径或上传压缩包",
"Function": HotReload(Latex英文润色)
},
"[测试功能] 中文Latex项目全文润色(输入路径或上传压缩包)": {
# HotReload 的意思是热更新,修改函数插件代码后,不需要重启程序,代码直接生效
"文Latex项目全文纠错(输入路径或上传压缩包)": {
"Group": "学术",
"Color": "stop",
"AsButton": False, # 加入下拉菜单中
"Info": "对英文Latex项目全文进行纠错处理 | 输入参数为路径或上传压缩包",
"Function": HotReload(Latex英文纠错)
},
"中文Latex项目全文润色输入路径或上传压缩包": {
"Group": "学术",
"Color": "stop",
"AsButton": False, # 加入下拉菜单中
"Info": "对中文Latex项目全文进行润色处理 | 输入参数为路径或上传压缩包",
"Function": HotReload(Latex中文润色)
},
"Latex项目全文中译英输入路径或上传压缩包": {
# HotReload 的意思是热更新,修改函数插件代码后,不需要重启程序,代码直接生效
"Color": "stop",
"AsButton": False, # 加入下拉菜单中
"Function": HotReload(Latex中译英)
},
"Latex项目全文英译中输入路径或上传压缩包": {
# HotReload 的意思是热更新,修改函数插件代码后,不需要重启程序,代码直接生效
"Color": "stop",
"AsButton": False, # 加入下拉菜单中
"Function": HotReload(Latex英译中)
},
# 被新插件取代
# "Latex项目全文中译英输入路径或上传压缩包": {
# "Group": "学术",
# "Color": "stop",
# "AsButton": False, # 加入下拉菜单中
# "Info": "对Latex项目全文进行中译英处理 | 输入参数为路径或上传压缩包",
# "Function": HotReload(Latex中译英)
# },
# "Latex项目全文英译中输入路径或上传压缩包": {
# "Group": "学术",
# "Color": "stop",
# "AsButton": False, # 加入下拉菜单中
# "Info": "对Latex项目全文进行英译中处理 | 输入参数为路径或上传压缩包",
# "Function": HotReload(Latex英译中)
# },
"批量Markdown中译英输入路径或上传压缩包": {
# HotReload 的意思是热更新,修改函数插件代码后,不需要重启程序,代码直接生效
"Group": "编程",
"Color": "stop",
"AsButton": False, # 加入下拉菜单中
"Info": "批量将Markdown文件中文翻译为英文 | 输入参数为路径或上传压缩包",
"Function": HotReload(Markdown中译英)
},
}
# -=--=- 尚未充分测试的实验性插件 & 需要额外依赖的插件 -=--=-
try:
from crazy_functions.下载arxiv论文翻译摘要 import 下载arxiv论文并翻译摘要
function_plugins.update({
"一键下载arxiv论文并翻译摘要先在input输入编号,如1812.10695": {
"Group": "学术",
"Color": "stop",
"AsButton": False, # 加入下拉菜单中
# "Info": "下载arxiv论文并翻译摘要 | 输入参数为arxiv编号如1812.10695",
"Function": HotReload(下载arxiv论文并翻译摘要)
}
})
except:
print('Load function plugin failed')
try:
from crazy_functions.联网的ChatGPT import 连接网络回答问题
function_plugins.update({
"连接网络回答问题(输入问题后点击该插件,需要访问谷歌)": {
"Group": "对话",
"Color": "stop",
"AsButton": False, # 加入下拉菜单中
# "Info": "连接网络回答问题(需要访问谷歌)| 输入参数是一个问题",
"Function": HotReload(连接网络回答问题)
}
})
from crazy_functions.联网的ChatGPT_bing版 import 连接bing搜索回答问题
function_plugins.update({
"连接网络回答问题中文Bing版,输入问题后点击该插件": {
"Group": "对话",
"Color": "stop",
"AsButton": False, # 加入下拉菜单中
"Info": "连接网络回答问题需要访问中文Bing| 输入参数是一个问题",
"Function": HotReload(连接bing搜索回答问题)
}
})
except:
print('Load function plugin failed')
try:
from crazy_functions.解析项目源代码 import 解析任意code项目
function_plugins.update({
"解析项目源代码(手动指定和筛选源代码文件类型)": {
"Group": "编程",
"Color": "stop",
"AsButton": False,
"AdvancedArgs": True, # 调用时,唤起高级参数输入区默认False
"ArgsReminder": "输入时用逗号隔开, *代表通配符, 加了^代表不匹配; 不输入代表全部匹配。例如: \"*.c, ^*.cpp, config.toml, ^*.toml\"", # 高级参数输入区的显示提示
"Function": HotReload(解析任意code项目)
},
})
except:
print('Load function plugin failed')
try:
from crazy_functions.询问多个大语言模型 import 同时问询_指定模型
function_plugins.update({
"询问多个GPT模型手动指定询问哪些模型": {
"Group": "对话",
"Color": "stop",
"AsButton": False,
"AdvancedArgs": True, # 调用时,唤起高级参数输入区默认False
"ArgsReminder": "支持任意数量的llm接口,用&符号分隔。例如chatglm&gpt-3.5-turbo&api2d-gpt-4", # 高级参数输入区的显示提示
"Function": HotReload(同时问询_指定模型)
},
})
except:
print('Load function plugin failed')
try:
from crazy_functions.图片生成 import 图片生成
function_plugins.update({
"图片生成先切换模型到openai或api2d": {
"Group": "对话",
"Color": "stop",
"AsButton": False,
"AdvancedArgs": True, # 调用时,唤起高级参数输入区默认False
"ArgsReminder": "在这里输入分辨率, 如256x256默认", # 高级参数输入区的显示提示
"Info": "图片生成 | 输入参数字符串,提供图像的内容",
"Function": HotReload(图片生成)
},
})
except:
print('Load function plugin failed')
try:
from crazy_functions.总结音视频 import 总结音视频
function_plugins.update({
"批量总结音视频(输入路径或上传压缩包)": {
"Group": "对话",
"Color": "stop",
"AsButton": False,
"AdvancedArgs": True,
"ArgsReminder": "调用openai api 使用whisper-1模型, 目前支持的格式:mp4, m4a, wav, mpga, mpeg, mp3。此处可以输入解析提示,例如解析为简体中文默认",
"Info": "批量总结音频或视频 | 输入参数为路径",
"Function": HotReload(总结音视频)
}
})
except:
print('Load function plugin failed')
try:
from crazy_functions.数学动画生成manim import 动画生成
function_plugins.update({
"数学动画生成Manim": {
"Group": "对话",
"Color": "stop",
"AsButton": False,
"Info": "按照自然语言描述生成一个动画 | 输入参数是一段话",
"Function": HotReload(动画生成)
}
})
except:
print('Load function plugin failed')
try:
from crazy_functions.批量Markdown翻译 import Markdown翻译指定语言
function_plugins.update({
"Markdown翻译手动指定语言": {
"Group": "编程",
"Color": "stop",
"AsButton": False,
"AdvancedArgs": True,
"ArgsReminder": "请输入要翻译成哪种语言,默认为Chinese。",
"Function": HotReload(Markdown翻译指定语言)
}
})
except:
print('Load function plugin failed')
try:
from crazy_functions.Langchain知识库 import 知识库问答
function_plugins.update({
"构建知识库(先上传文件素材,再运行此插件)": {
"Group": "对话",
"Color": "stop",
"AsButton": False,
"AdvancedArgs": True,
"ArgsReminder": "此处待注入的知识库名称id, 默认为default。文件进入知识库后可长期保存。可以通过再次调用本插件的方式,向知识库追加更多文档。",
"Function": HotReload(知识库问答)
}
})
except:
print('Load function plugin failed')
try:
from crazy_functions.Langchain知识库 import 读取知识库作答
function_plugins.update({
"知识库问答(构建知识库后,再运行此插件)": {
"Group": "对话",
"Color": "stop",
"AsButton": False,
"AdvancedArgs": True,
"ArgsReminder": "待提取的知识库名称id, 默认为default, 您需要构建知识库后再运行此插件。",
"Function": HotReload(读取知识库作答)
}
})
except:
print('Load function plugin failed')
try:
from crazy_functions.交互功能函数模板 import 交互功能模板函数
function_plugins.update({
"交互功能模板函数": {
"Group": "对话",
"Color": "stop",
"AsButton": False,
"Function": HotReload(交互功能模板函数)
}
})
except:
print('Load function plugin failed')
try:
from crazy_functions.Latex输出PDF结果 import Latex英文纠错加PDF对比
function_plugins.update({
"Latex英文纠错+高亮修正位置 [需Latex]": {
"Group": "学术",
"Color": "stop",
"AsButton": False,
"AdvancedArgs": True,
"ArgsReminder": "如果有必要, 请在此处追加更细致的矫错指令(使用英文)。",
"Function": HotReload(Latex英文纠错加PDF对比)
}
})
from crazy_functions.Latex输出PDF结果 import Latex翻译中文并重新编译PDF
function_plugins.update({
"Arixv论文精细翻译输入arxivID[需Latex]": {
"Group": "学术",
"Color": "stop",
"AsButton": False,
"AdvancedArgs": True,
"ArgsReminder":
"如果有必要, 请在此处给出自定义翻译命令, 解决部分词汇翻译不准确的问题。 " +
"例如当单词'agent'翻译不准确时, 请尝试把以下指令复制到高级参数区: " +
'If the term "agent" is used in this section, it should be translated to "智能体". ',
"Info": "Arixv论文精细翻译 | 输入参数arxiv论文的ID,比如1812.10695",
"Function": HotReload(Latex翻译中文并重新编译PDF)
}
})
function_plugins.update({
"本地Latex论文精细翻译上传Latex项目[需Latex]": {
"Group": "学术",
"Color": "stop",
"AsButton": False,
"AdvancedArgs": True,
"ArgsReminder":
"如果有必要, 请在此处给出自定义翻译命令, 解决部分词汇翻译不准确的问题。 " +
"例如当单词'agent'翻译不准确时, 请尝试把以下指令复制到高级参数区: " +
'If the term "agent" is used in this section, it should be translated to "智能体". ',
"Info": "本地Latex论文精细翻译 | 输入参数是路径",
"Function": HotReload(Latex翻译中文并重新编译PDF)
}
})
except:
print('Load function plugin failed')
try:
from toolbox import get_conf
ENABLE_AUDIO, = get_conf('ENABLE_AUDIO')
if ENABLE_AUDIO:
from crazy_functions.语音助手 import 语音助手
function_plugins.update({
"实时音频采集": {
"Group": "对话",
"Color": "stop",
"AsButton": True,
"Info": "开始语言对话 | 没有输入参数",
"Function": HotReload(语音助手)
}
})
except:
print('Load function plugin failed')
try:
from crazy_functions.批量翻译PDF文档_NOUGAT import 批量翻译PDF文档
function_plugins.update({
"精准翻译PDF文档NOUGAT": {
"Group": "学术",
"Color": "stop",
"AsButton": False,
"Function": HotReload(批量翻译PDF文档)
}
})
except:
print('Load function plugin failed')
try:
from crazy_functions.函数动态生成 import 函数动态生成
function_plugins.update({
"动态代码解释器CodeInterpreter": {
"Group": "智能体",
"Color": "stop",
"AsButton": False,
"Function": HotReload(函数动态生成)
}
})
except:
print('Load function plugin failed')
# try:
# from crazy_functions.CodeInterpreter import 虚空终端CodeInterpreter
# function_plugins.update({
# "CodeInterpreter开发中,仅供测试": {
# "Group": "编程|对话",
# "Color": "stop",
# "AsButton": False,
# "Function": HotReload(虚空终端CodeInterpreter)
# }
# })
# except:
# print('Load function plugin failed')
# try:
# from crazy_functions.chatglm微调工具 import 微调数据集生成
# function_plugins.update({
# "黑盒模型学习: 微调数据集生成 (先上传数据集)": {
# "Color": "stop",
# "AsButton": False,
# "AdvancedArgs": True,
# "ArgsReminder": "针对数据集输入(如 绿帽子*深蓝色衬衫*黑色运动裤)给出指令,例如您可以将以下命令复制到下方: --llm_to_learn=azure-gpt-3.5 --prompt_prefix='根据下面的服装类型提示,想象一个穿着者,对这个人外貌、身处的环境、内心世界、过去经历进行描写。要求100字以内,用第二人称。' --system_prompt=''",
# "Function": HotReload(微调数据集生成)
# }
# })
# except:
# print('Load function plugin failed')
})
###################### 第三组插件 ###########################
# [第三组插件]: 尚未充分测试的函数插件,放在这里
from crazy_functions.下载arxiv论文翻译摘要 import 下载arxiv论文并翻译摘要
function_plugins.update({
"一键下载arxiv论文并翻译摘要先在input输入编号,如1812.10695": {
"Color": "stop",
"AsButton": False, # 加入下拉菜单中
"Function": HotReload(下载arxiv论文并翻译摘要)
}
})
"""
设置默认值:
- 默认 Group = 对话
- 默认 AsButton = True
- 默认 AdvancedArgs = False
- 默认 Color = secondary
"""
for name, function_meta in function_plugins.items():
if "Group" not in function_meta:
function_plugins[name]["Group"] = '对话'
if "AsButton" not in function_meta:
function_plugins[name]["AsButton"] = True
if "AdvancedArgs" not in function_meta:
function_plugins[name]["AdvancedArgs"] = False
if "Color" not in function_meta:
function_plugins[name]["Color"] = 'secondary'
from crazy_functions.联网的ChatGPT import 连接网络回答问题
function_plugins.update({
"连接网络回答问题(先输入问题,再点击按钮,需要访问谷歌)": {
"Color": "stop",
"AsButton": False, # 加入下拉菜单中
"Function": HotReload(连接网络回答问题)
}
})
from crazy_functions.解析项目源代码 import 解析任意code项目
function_plugins.update({
"解析项目源代码(手动指定和筛选源代码文件类型)": {
"Color": "stop",
"AsButton": False,
"AdvancedArgs": True, # 调用时,唤起高级参数输入区默认False
"ArgsReminder": "输入时用逗号隔开, *代表通配符, 加了^代表不匹配; 不输入代表全部匹配。例如: \"*.c, ^*.cpp, config.toml, ^*.toml\"", # 高级参数输入区的显示提示
"Function": HotReload(解析任意code项目)
},
})
from crazy_functions.询问多个大语言模型 import 同时问询_指定模型
function_plugins.update({
"询问多个GPT模型手动指定询问哪些模型": {
"Color": "stop",
"AsButton": False,
"AdvancedArgs": True, # 调用时,唤起高级参数输入区默认False
"ArgsReminder": "支持任意数量的llm接口,用&符号分隔。例如chatglm&gpt-3.5-turbo&api2d-gpt-4", # 高级参数输入区的显示提示
"Function": HotReload(同时问询_指定模型)
},
})
###################### 第n组插件 ###########################
return function_plugins

查看文件

@@ -0,0 +1,232 @@
from collections.abc import Callable, Iterable, Mapping
from typing import Any
from toolbox import CatchException, update_ui, gen_time_str, trimmed_format_exc
from toolbox import promote_file_to_downloadzone, get_log_folder
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from .crazy_utils import input_clipping, try_install_deps
from multiprocessing import Process, Pipe
import os
import time
templete = """
```python
import ... # Put dependencies here, e.g. import numpy as np
class TerminalFunction(object): # Do not change the name of the class, The name of the class must be `TerminalFunction`
def run(self, path): # The name of the function must be `run`, it takes only a positional argument.
# rewrite the function you have just written here
...
return generated_file_path
```
"""
def inspect_dependency(chatbot, history):
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return True
def get_code_block(reply):
import re
pattern = r"```([\s\S]*?)```" # regex pattern to match code blocks
matches = re.findall(pattern, reply) # find all code blocks in text
if len(matches) == 1:
return matches[0].strip('python') # code block
for match in matches:
if 'class TerminalFunction' in match:
return match.strip('python') # code block
raise RuntimeError("GPT is not generating proper code.")
def gpt_interact_multi_step(txt, file_type, llm_kwargs, chatbot, history):
# 输入
prompt_compose = [
f'Your job:\n'
f'1. write a single Python function, which takes a path of a `{file_type}` file as the only argument and returns a `string` containing the result of analysis or the path of generated files. \n',
f"2. You should write this function to perform following task: " + txt + "\n",
f"3. Wrap the output python function with markdown codeblock."
]
i_say = "".join(prompt_compose)
demo = []
# 第一步
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
inputs=i_say, inputs_show_user=i_say,
llm_kwargs=llm_kwargs, chatbot=chatbot, history=demo,
sys_prompt= r"You are a programmer."
)
history.extend([i_say, gpt_say])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 界面更新
# 第二步
prompt_compose = [
"If previous stage is successful, rewrite the function you have just written to satisfy following templete: \n",
templete
]
i_say = "".join(prompt_compose); inputs_show_user = "If previous stage is successful, rewrite the function you have just written to satisfy executable templete. "
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
inputs=i_say, inputs_show_user=inputs_show_user,
llm_kwargs=llm_kwargs, chatbot=chatbot, history=history,
sys_prompt= r"You are a programmer."
)
code_to_return = gpt_say
history.extend([i_say, gpt_say])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 界面更新
# # 第三步
# i_say = "Please list to packages to install to run the code above. Then show me how to use `try_install_deps` function to install them."
# i_say += 'For instance. `try_install_deps(["opencv-python", "scipy", "numpy"])`'
# installation_advance = yield from request_gpt_model_in_new_thread_with_ui_alive(
# inputs=i_say, inputs_show_user=inputs_show_user,
# llm_kwargs=llm_kwargs, chatbot=chatbot, history=history,
# sys_prompt= r"You are a programmer."
# )
# # # 第三步
# i_say = "Show me how to use `pip` to install packages to run the code above. "
# i_say += 'For instance. `pip install -r opencv-python scipy numpy`'
# installation_advance = yield from request_gpt_model_in_new_thread_with_ui_alive(
# inputs=i_say, inputs_show_user=i_say,
# llm_kwargs=llm_kwargs, chatbot=chatbot, history=history,
# sys_prompt= r"You are a programmer."
# )
installation_advance = ""
return code_to_return, installation_advance, txt, file_type, llm_kwargs, chatbot, history
def make_module(code):
module_file = 'gpt_fn_' + gen_time_str().replace('-','_')
with open(f'{get_log_folder()}/{module_file}.py', 'w', encoding='utf8') as f:
f.write(code)
def get_class_name(class_string):
import re
# Use regex to extract the class name
class_name = re.search(r'class (\w+)\(', class_string).group(1)
return class_name
class_name = get_class_name(code)
return f"{get_log_folder().replace('/', '.')}.{module_file}->{class_name}"
def init_module_instance(module):
import importlib
module_, class_ = module.split('->')
init_f = getattr(importlib.import_module(module_), class_)
return init_f()
def for_immediate_show_off_when_possible(file_type, fp, chatbot):
if file_type in ['png', 'jpg']:
image_path = os.path.abspath(fp)
chatbot.append(['这是一张图片, 展示如下:',
f'本地文件地址: <br/>`{image_path}`<br/>'+
f'本地文件预览: <br/><div align="center"><img src="file={image_path}"></div>'
])
return chatbot
def subprocess_worker(instance, file_path, return_dict):
return_dict['result'] = instance.run(file_path)
def have_any_recent_upload_files(chatbot):
_5min = 5 * 60
if not chatbot: return False # chatbot is None
most_recent_uploaded = chatbot._cookies.get("most_recent_uploaded", None)
if not most_recent_uploaded: return False # most_recent_uploaded is None
if time.time() - most_recent_uploaded["time"] < _5min: return True # most_recent_uploaded is new
else: return False # most_recent_uploaded is too old
def get_recent_file_prompt_support(chatbot):
most_recent_uploaded = chatbot._cookies.get("most_recent_uploaded", None)
path = most_recent_uploaded['path']
return path
@CatchException
def 虚空终端CodeInterpreter(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
"""
txt 输入栏用户输入的文本,例如需要翻译的一段话,再例如一个包含了待处理文件的路径
llm_kwargs gpt模型参数,如温度和top_p等,一般原样传递下去就行
plugin_kwargs 插件模型的参数,暂时没有用武之地
chatbot 聊天显示框的句柄,用于显示给用户
history 聊天历史,前情提要
system_prompt 给gpt的静默提醒
web_port 当前软件运行的端口号
"""
raise NotImplementedError
# 清空历史,以免输入溢出
history = []; clear_file_downloadzone(chatbot)
# 基本信息:功能、贡献者
chatbot.append([
"函数插件功能?",
"CodeInterpreter开源版, 此插件处于开发阶段, 建议暂时不要使用, 插件初始化中 ..."
])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
if have_any_recent_upload_files(chatbot):
file_path = get_recent_file_prompt_support(chatbot)
else:
chatbot.append(["文件检索", "没有发现任何近期上传的文件。"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# 读取文件
if ("recently_uploaded_files" in plugin_kwargs) and (plugin_kwargs["recently_uploaded_files"] == ""): plugin_kwargs.pop("recently_uploaded_files")
recently_uploaded_files = plugin_kwargs.get("recently_uploaded_files", None)
file_path = recently_uploaded_files[-1]
file_type = file_path.split('.')[-1]
# 粗心检查
if is_the_upload_folder(txt):
chatbot.append([
"...",
f"请在输入框内填写需求,然后再次点击该插件(文件路径 {file_path} 已经被记忆)"
])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# 开始干正事
for j in range(5): # 最多重试5次
try:
code, installation_advance, txt, file_type, llm_kwargs, chatbot, history = \
yield from gpt_interact_multi_step(txt, file_type, llm_kwargs, chatbot, history)
code = get_code_block(code)
res = make_module(code)
instance = init_module_instance(res)
break
except Exception as e:
chatbot.append([f"{j}次代码生成尝试,失败了", f"错误追踪\n```\n{trimmed_format_exc()}\n```\n"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# 代码生成结束, 开始执行
try:
import multiprocessing
manager = multiprocessing.Manager()
return_dict = manager.dict()
p = multiprocessing.Process(target=subprocess_worker, args=(instance, file_path, return_dict))
# only has 10 seconds to run
p.start(); p.join(timeout=10)
if p.is_alive(): p.terminate(); p.join()
p.close()
res = return_dict['result']
# res = instance.run(file_path)
except Exception as e:
chatbot.append(["执行失败了", f"错误追踪\n```\n{trimmed_format_exc()}\n```\n"])
# chatbot.append(["如果是缺乏依赖,请参考以下建议", installation_advance])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# 顺利完成,收尾
res = str(res)
if os.path.exists(res):
chatbot.append(["执行成功了,结果是一个有效文件", "结果:" + res])
new_file_path = promote_file_to_downloadzone(res, chatbot=chatbot)
chatbot = for_immediate_show_off_when_possible(file_type, new_file_path, chatbot)
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 界面更新
else:
chatbot.append(["执行成功了,结果是一个字符串", "结果:" + res])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 界面更新
"""
测试:
裁剪图像,保留下半部分
交换图像的蓝色通道和红色通道
将图像转为灰度图像
将csv文件转excel表格
"""

查看文件

@@ -0,0 +1,106 @@
from toolbox import CatchException, update_ui, ProxyNetworkActivate, update_ui_lastest_msg
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive, get_files_from_everything
@CatchException
def 知识库问答(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
"""
txt 输入栏用户输入的文本,例如需要翻译的一段话,再例如一个包含了待处理文件的路径
llm_kwargs gpt模型参数, 如温度和top_p等, 一般原样传递下去就行
plugin_kwargs 插件模型的参数,暂时没有用武之地
chatbot 聊天显示框的句柄,用于显示给用户
history 聊天历史,前情提要
system_prompt 给gpt的静默提醒
web_port 当前软件运行的端口号
"""
history = [] # 清空历史,以免输入溢出
# < --------------------读取参数--------------- >
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
kai_id = plugin_kwargs.get("advanced_arg", 'default')
chatbot.append((f"向`{kai_id}`知识库中添加文件。", "[Local Message] 从一批文件(txt, md, tex)中读取数据构建知识库, 然后进行问答。"))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# resolve deps
try:
from zh_langchain import construct_vector_store
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from .crazy_utils import knowledge_archive_interface
except Exception as e:
chatbot.append(["依赖不足", "导入依赖失败。正在尝试自动安装,请查看终端的输出或耐心等待..."])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
from .crazy_utils import try_install_deps
try_install_deps(['zh_langchain==0.2.1', 'pypinyin'], reload_m=['pypinyin', 'zh_langchain'])
yield from update_ui_lastest_msg("安装完成,您可以再次重试。", chatbot, history)
return
# < --------------------读取文件--------------- >
file_manifest = []
spl = ["txt", "doc", "docx", "email", "epub", "html", "json", "md", "msg", "pdf", "ppt", "pptx", "rtf"]
for sp in spl:
_, file_manifest_tmp, _ = get_files_from_everything(txt, type=f'.{sp}')
file_manifest += file_manifest_tmp
if len(file_manifest) == 0:
chatbot.append(["没有找到任何可读取文件", "当前支持的格式包括: txt, md, docx, pptx, pdf, json等"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# < -------------------预热文本向量化模组--------------- >
chatbot.append(['<br/>'.join(file_manifest), "正在预热文本向量化模组, 如果是第一次运行, 将消耗较长时间下载中文向量化模型..."])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
print('Checking Text2vec ...')
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
with ProxyNetworkActivate('Download_LLM'): # 临时地激活代理网络
HuggingFaceEmbeddings(model_name="GanymedeNil/text2vec-large-chinese")
# < -------------------构建知识库--------------- >
chatbot.append(['<br/>'.join(file_manifest), "正在构建知识库..."])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
print('Establishing knowledge archive ...')
with ProxyNetworkActivate('Download_LLM'): # 临时地激活代理网络
kai = knowledge_archive_interface()
kai.feed_archive(file_manifest=file_manifest, id=kai_id)
kai_files = kai.get_loaded_file()
kai_files = '<br/>'.join(kai_files)
# chatbot.append(['知识库构建成功', "正在将知识库存储至cookie中"])
# yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# chatbot._cookies['langchain_plugin_embedding'] = kai.get_current_archive_id()
# chatbot._cookies['lock_plugin'] = 'crazy_functions.Langchain知识库->读取知识库作答'
# chatbot.append(['完成', "“根据知识库作答”函数插件已经接管问答系统, 提问吧! 但注意, 您接下来不能再使用其他插件了,刷新页面即可以退出知识库问答模式。"])
chatbot.append(['构建完成', f"当前知识库内的有效文件:\n\n---\n\n{kai_files}\n\n---\n\n请切换至“知识库问答”插件进行知识库访问, 或者使用此插件继续上传更多文件。"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 由于请求gpt需要一段时间,我们先及时地做一次界面更新
@CatchException
def 读取知识库作答(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port=-1):
# resolve deps
try:
from zh_langchain import construct_vector_store
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from .crazy_utils import knowledge_archive_interface
except Exception as e:
chatbot.append(["依赖不足", "导入依赖失败。正在尝试自动安装,请查看终端的输出或耐心等待..."])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
from .crazy_utils import try_install_deps
try_install_deps(['zh_langchain==0.2.1', 'pypinyin'], reload_m=['pypinyin', 'zh_langchain'])
yield from update_ui_lastest_msg("安装完成,您可以再次重试。", chatbot, history)
return
# < ------------------- --------------- >
kai = knowledge_archive_interface()
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
kai_id = plugin_kwargs.get("advanced_arg", 'default')
resp, prompt = kai.answer_with_archive_by_id(txt, kai_id)
chatbot.append((txt, f'[知识库 {kai_id}] ' + prompt))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 由于请求gpt需要一段时间,我们先及时地做一次界面更新
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
inputs=prompt, inputs_show_user=txt,
llm_kwargs=llm_kwargs, chatbot=chatbot, history=[],
sys_prompt=system_prompt
)
history.extend((prompt, gpt_say))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 由于请求gpt需要一段时间,我们先及时地做一次界面更新

查看文件

@@ -1,6 +1,6 @@
from toolbox import update_ui
from toolbox import CatchException, report_execption, write_results_to_file
fast_debug = False
from toolbox import update_ui, trimmed_format_exc, promote_file_to_downloadzone, get_log_folder
from toolbox import CatchException, report_execption, write_history_to_file, zip_folder
class PaperFileGroup():
def __init__(self):
@@ -34,8 +34,27 @@ class PaperFileGroup():
self.sp_file_tag.append(self.file_paths[index] + f".part-{j}.tex")
print('Segmentation: done')
def merge_result(self):
self.file_result = ["" for _ in range(len(self.file_paths))]
for r, k in zip(self.sp_file_result, self.sp_file_index):
self.file_result[k] += r
def 多文件润色(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, language='en'):
def write_result(self):
manifest = []
for path, res in zip(self.file_paths, self.file_result):
with open(path + '.polish.tex', 'w', encoding='utf8') as f:
manifest.append(path + '.polish.tex')
f.write(res)
return manifest
def zip_result(self):
import os, time
folder = os.path.dirname(self.file_paths[0])
t = time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime())
zip_folder(folder, get_log_folder(), f'{t}-polished.zip')
def 多文件润色(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, language='en', mode='polish'):
import time, os, re
from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
@@ -47,7 +66,7 @@ def 多文件润色(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
with open(fp, 'r', encoding='utf-8', errors='replace') as f:
file_content = f.read()
# 定义注释的正则表达式
comment_pattern = r'%.*'
comment_pattern = r'(?<!\\)%.*'
# 使用正则表达式查找注释,并替换为空字符串
clean_tex_content = re.sub(comment_pattern, '', file_content)
# 记录删除注释后的文本
@@ -58,28 +77,27 @@ def 多文件润色(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
pfg.run_file_split(max_token_limit=1024)
n_split = len(pfg.sp_file_contents)
# <-------- 抽取摘要 ---------->
# if language == 'en':
# abs_extract_inputs = f"Please write an abstract for this paper"
# # 单线,获取文章meta信息
# paper_meta_info = yield from request_gpt_model_in_new_thread_with_ui_alive(
# inputs=abs_extract_inputs,
# inputs_show_user=f"正在抽取摘要信息。",
# llm_kwargs=llm_kwargs,
# chatbot=chatbot, history=[],
# sys_prompt="Your job is to collect information from materials。",
# )
# <-------- 多线程润色开始 ---------->
if language == 'en':
inputs_array = ["Below is a section from an academic paper, polish this section to meet the academic standard, improve the grammar, clarity and overall readability, do not modify any latex command such as \section, \cite and equations:" +
if mode == 'polish':
inputs_array = ["Below is a section from an academic paper, polish this section to meet the academic standard, " +
"improve the grammar, clarity and overall readability, do not modify any latex command such as \section, \cite and equations:" +
f"\n\n{frag}" for frag in pfg.sp_file_contents]
else:
inputs_array = [r"Below is a section from an academic paper, proofread this section." +
r"Do not modify any latex command such as \section, \cite, \begin, \item and equations. " +
r"Answer me only with the revised text:" +
f"\n\n{frag}" for frag in pfg.sp_file_contents]
inputs_show_user_array = [f"Polish {f}" for f in pfg.sp_file_tag]
sys_prompt_array = ["You are a professional academic paper writer." for _ in range(n_split)]
elif language == 'zh':
inputs_array = [f"以下是一篇学术论文中的一段内容,请将此部分润色以满足学术标准,提高语法、清晰度和整体可读性,不要修改任何LaTeX命令,例如\section,\cite和方程式" +
f"\n\n{frag}" for frag in pfg.sp_file_contents]
if mode == 'polish':
inputs_array = [f"以下是一篇学术论文中的一段内容,请将此部分润色以满足学术标准,提高语法、清晰度和整体可读性,不要修改任何LaTeX命令,例如\section,\cite和方程式" +
f"\n\n{frag}" for frag in pfg.sp_file_contents]
else:
inputs_array = [f"以下是一篇学术论文中的一段内容,请对这部分内容进行语法矫正。不要修改任何LaTeX命令,例如\section,\cite和方程式" +
f"\n\n{frag}" for frag in pfg.sp_file_contents]
inputs_show_user_array = [f"润色 {f}" for f in pfg.sp_file_tag]
sys_prompt_array=["你是一位专业的中文学术论文作家。" for _ in range(n_split)]
@@ -95,9 +113,22 @@ def 多文件润色(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
scroller_max_len = 80
)
# <-------- 文本碎片重组为完整的tex文件,整理结果为压缩包 ---------->
try:
pfg.sp_file_result = []
for i_say, gpt_say in zip(gpt_response_collection[0::2], gpt_response_collection[1::2]):
pfg.sp_file_result.append(gpt_say)
pfg.merge_result()
pfg.write_result()
pfg.zip_result()
except:
print(trimmed_format_exc())
# <-------- 整理结果,退出 ---------->
create_report_file_name = time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime()) + f"-chatgpt.polish.md"
res = write_results_to_file(gpt_response_collection, file_name=create_report_file_name)
res = write_history_to_file(gpt_response_collection, file_basename=create_report_file_name)
promote_file_to_downloadzone(res, chatbot=chatbot)
history = gpt_response_collection
chatbot.append((f"{fp}完成了吗?", res))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
@@ -108,7 +139,7 @@ def Latex英文润色(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_p
# 基本信息:功能、贡献者
chatbot.append([
"函数插件功能?",
"对整个Latex项目进行润色。函数插件贡献者: Binary-Husky"])
"对整个Latex项目进行润色。函数插件贡献者: Binary-Husky注意,此插件不调用Latex,如果有Latex环境,请使用“Latex英文纠错+高亮”插件)"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# 尝试导入依赖,如果缺少依赖,则给出安装建议
@@ -172,4 +203,43 @@ def Latex中文润色(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_p
report_execption(chatbot, history, a = f"解析项目: {txt}", b = f"找不到任何.tex文件: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
yield from 多文件润色(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, language='zh')
yield from 多文件润色(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, language='zh')
@CatchException
def Latex英文纠错(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
# 基本信息:功能、贡献者
chatbot.append([
"函数插件功能?",
"对整个Latex项目进行纠错。函数插件贡献者: Binary-Husky"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# 尝试导入依赖,如果缺少依赖,则给出安装建议
try:
import tiktoken
except:
report_execption(chatbot, history,
a=f"解析项目: {txt}",
b=f"导入软件依赖失败。使用该模块需要额外依赖,安装方法```pip install --upgrade tiktoken```。")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
history = [] # 清空历史,以免输入溢出
import glob, os
if os.path.exists(txt):
project_folder = txt
else:
if txt == "": txt = '空空如也的输入栏'
report_execption(chatbot, history, a = f"解析项目: {txt}", b = f"找不到本地项目或无权访问: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
file_manifest = [f for f in glob.glob(f'{project_folder}/**/*.tex', recursive=True)]
if len(file_manifest) == 0:
report_execption(chatbot, history, a = f"解析项目: {txt}", b = f"找不到任何.tex文件: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
yield from 多文件润色(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, language='en', mode='proofread')

查看文件

@@ -1,5 +1,5 @@
from toolbox import update_ui
from toolbox import CatchException, report_execption, write_results_to_file
from toolbox import update_ui, promote_file_to_downloadzone
from toolbox import CatchException, report_execption, write_history_to_file
fast_debug = False
class PaperFileGroup():
@@ -46,7 +46,7 @@ def 多文件翻译(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
with open(fp, 'r', encoding='utf-8', errors='replace') as f:
file_content = f.read()
# 定义注释的正则表达式
comment_pattern = r'%.*'
comment_pattern = r'(?<!\\)%.*'
# 使用正则表达式查找注释,并替换为空字符串
clean_tex_content = re.sub(comment_pattern, '', file_content)
# 记录删除注释后的文本
@@ -95,7 +95,8 @@ def 多文件翻译(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
# <-------- 整理结果,退出 ---------->
create_report_file_name = time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime()) + f"-chatgpt.polish.md"
res = write_results_to_file(gpt_response_collection, file_name=create_report_file_name)
res = write_history_to_file(gpt_response_collection, create_report_file_name)
promote_file_to_downloadzone(res, chatbot=chatbot)
history = gpt_response_collection
chatbot.append((f"{fp}完成了吗?", res))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面

查看文件

@@ -0,0 +1,303 @@
from toolbox import update_ui, trimmed_format_exc, get_conf, get_log_folder, promote_file_to_downloadzone
from toolbox import CatchException, report_execption, update_ui_lastest_msg, zip_result, gen_time_str
from functools import partial
import glob, os, requests, time
pj = os.path.join
ARXIV_CACHE_DIR = os.path.expanduser(f"~/arxiv_cache/")
# =================================== 工具函数 ===============================================
# 专业词汇声明 = 'If the term "agent" is used in this section, it should be translated to "智能体". '
def switch_prompt(pfg, mode, more_requirement):
"""
Generate prompts and system prompts based on the mode for proofreading or translating.
Args:
- pfg: Proofreader or Translator instance.
- mode: A string specifying the mode, either 'proofread' or 'translate_zh'.
Returns:
- inputs_array: A list of strings containing prompts for users to respond to.
- sys_prompt_array: A list of strings containing prompts for system prompts.
"""
n_split = len(pfg.sp_file_contents)
if mode == 'proofread_en':
inputs_array = [r"Below is a section from an academic paper, proofread this section." +
r"Do not modify any latex command such as \section, \cite, \begin, \item and equations. " + more_requirement +
r"Answer me only with the revised text:" +
f"\n\n{frag}" for frag in pfg.sp_file_contents]
sys_prompt_array = ["You are a professional academic paper writer." for _ in range(n_split)]
elif mode == 'translate_zh':
inputs_array = [r"Below is a section from an English academic paper, translate it into Chinese. " + more_requirement +
r"Do not modify any latex command such as \section, \cite, \begin, \item and equations. " +
r"Answer me only with the translated text:" +
f"\n\n{frag}" for frag in pfg.sp_file_contents]
sys_prompt_array = ["You are a professional translator." for _ in range(n_split)]
else:
assert False, "未知指令"
return inputs_array, sys_prompt_array
def desend_to_extracted_folder_if_exist(project_folder):
"""
Descend into the extracted folder if it exists, otherwise return the original folder.
Args:
- project_folder: A string specifying the folder path.
Returns:
- A string specifying the path to the extracted folder, or the original folder if there is no extracted folder.
"""
maybe_dir = [f for f in glob.glob(f'{project_folder}/*') if os.path.isdir(f)]
if len(maybe_dir) == 0: return project_folder
if maybe_dir[0].endswith('.extract'): return maybe_dir[0]
return project_folder
def move_project(project_folder, arxiv_id=None):
"""
Create a new work folder and copy the project folder to it.
Args:
- project_folder: A string specifying the folder path of the project.
Returns:
- A string specifying the path to the new work folder.
"""
import shutil, time
time.sleep(2) # avoid time string conflict
if arxiv_id is not None:
new_workfolder = pj(ARXIV_CACHE_DIR, arxiv_id, 'workfolder')
else:
new_workfolder = f'{get_log_folder()}/{gen_time_str()}'
try:
shutil.rmtree(new_workfolder)
except:
pass
# align subfolder if there is a folder wrapper
items = glob.glob(pj(project_folder,'*'))
if len(glob.glob(pj(project_folder,'*.tex'))) == 0 and len(items) == 1:
if os.path.isdir(items[0]): project_folder = items[0]
shutil.copytree(src=project_folder, dst=new_workfolder)
return new_workfolder
def arxiv_download(chatbot, history, txt, allow_cache=True):
def check_cached_translation_pdf(arxiv_id):
translation_dir = pj(ARXIV_CACHE_DIR, arxiv_id, 'translation')
if not os.path.exists(translation_dir):
os.makedirs(translation_dir)
target_file = pj(translation_dir, 'translate_zh.pdf')
if os.path.exists(target_file):
promote_file_to_downloadzone(target_file, rename_file=None, chatbot=chatbot)
return target_file
return False
def is_float(s):
try:
float(s)
return True
except ValueError:
return False
if ('.' in txt) and ('/' not in txt) and is_float(txt): # is arxiv ID
txt = 'https://arxiv.org/abs/' + txt.strip()
if ('.' in txt) and ('/' not in txt) and is_float(txt[:10]): # is arxiv ID
txt = 'https://arxiv.org/abs/' + txt[:10]
if not txt.startswith('https://arxiv.org'):
return txt, None
# <-------------- inspect format ------------->
chatbot.append([f"检测到arxiv文档连接", '尝试下载 ...'])
yield from update_ui(chatbot=chatbot, history=history)
time.sleep(1) # 刷新界面
url_ = txt # https://arxiv.org/abs/1707.06690
if not txt.startswith('https://arxiv.org/abs/'):
msg = f"解析arxiv网址失败, 期望格式例如: https://arxiv.org/abs/1707.06690。实际得到格式: {url_}"
yield from update_ui_lastest_msg(msg, chatbot=chatbot, history=history) # 刷新界面
return msg, None
# <-------------- set format ------------->
arxiv_id = url_.split('/abs/')[-1]
if 'v' in arxiv_id: arxiv_id = arxiv_id[:10]
cached_translation_pdf = check_cached_translation_pdf(arxiv_id)
if cached_translation_pdf and allow_cache: return cached_translation_pdf, arxiv_id
url_tar = url_.replace('/abs/', '/e-print/')
translation_dir = pj(ARXIV_CACHE_DIR, arxiv_id, 'e-print')
extract_dst = pj(ARXIV_CACHE_DIR, arxiv_id, 'extract')
os.makedirs(translation_dir, exist_ok=True)
# <-------------- download arxiv source file ------------->
dst = pj(translation_dir, arxiv_id+'.tar')
if os.path.exists(dst):
yield from update_ui_lastest_msg("调用缓存", chatbot=chatbot, history=history) # 刷新界面
else:
yield from update_ui_lastest_msg("开始下载", chatbot=chatbot, history=history) # 刷新界面
proxies, = get_conf('proxies')
r = requests.get(url_tar, proxies=proxies)
with open(dst, 'wb+') as f:
f.write(r.content)
# <-------------- extract file ------------->
yield from update_ui_lastest_msg("下载完成", chatbot=chatbot, history=history) # 刷新界面
from toolbox import extract_archive
extract_archive(file_path=dst, dest_dir=extract_dst)
return extract_dst, arxiv_id
# ========================================= 插件主程序1 =====================================================
@CatchException
def Latex英文纠错加PDF对比(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
# <-------------- information about this plugin ------------->
chatbot.append([ "函数插件功能?",
"对整个Latex项目进行纠错, 用latex编译为PDF对修正处做高亮。函数插件贡献者: Binary-Husky。注意事项: 目前仅支持GPT3.5/GPT4,其他模型转化效果未知。目前对机器学习类文献转化效果最好,其他类型文献转化效果未知。仅在Windows系统进行了测试,其他操作系统表现未知。"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# <-------------- more requirements ------------->
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
more_req = plugin_kwargs.get("advanced_arg", "")
_switch_prompt_ = partial(switch_prompt, more_requirement=more_req)
# <-------------- check deps ------------->
try:
import glob, os, time, subprocess
subprocess.Popen(['pdflatex', '-version'])
from .latex_fns.latex_actions import Latex精细分解与转化, 编译Latex
except Exception as e:
chatbot.append([ f"解析项目: {txt}",
f"尝试执行Latex指令失败。Latex没有安装, 或者不在环境变量PATH中。安装方法https://tug.org/texlive/。报错信息\n\n```\n\n{trimmed_format_exc()}\n\n```\n\n"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# <-------------- clear history and read input ------------->
history = []
if os.path.exists(txt):
project_folder = txt
else:
if txt == "": txt = '空空如也的输入栏'
report_execption(chatbot, history, a = f"解析项目: {txt}", b = f"找不到本地项目或无权访问: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
file_manifest = [f for f in glob.glob(f'{project_folder}/**/*.tex', recursive=True)]
if len(file_manifest) == 0:
report_execption(chatbot, history, a = f"解析项目: {txt}", b = f"找不到任何.tex文件: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# <-------------- if is a zip/tar file ------------->
project_folder = desend_to_extracted_folder_if_exist(project_folder)
# <-------------- move latex project away from temp folder ------------->
project_folder = move_project(project_folder, arxiv_id=None)
# <-------------- if merge_translate_zh is already generated, skip gpt req ------------->
if not os.path.exists(project_folder + '/merge_proofread_en.tex'):
yield from Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin_kwargs,
chatbot, history, system_prompt, mode='proofread_en', switch_prompt=_switch_prompt_)
# <-------------- compile PDF ------------->
success = yield from 编译Latex(chatbot, history, main_file_original='merge', main_file_modified='merge_proofread_en',
work_folder_original=project_folder, work_folder_modified=project_folder, work_folder=project_folder)
# <-------------- zip PDF ------------->
zip_res = zip_result(project_folder)
if success:
chatbot.append((f"成功啦", '请查收结果(压缩包)...'))
yield from update_ui(chatbot=chatbot, history=history); time.sleep(1) # 刷新界面
promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
else:
chatbot.append((f"失败了", '虽然PDF生成失败了, 但请查收结果(压缩包), 内含已经翻译的Tex文档, 也是可读的, 您可以到Github Issue区, 用该压缩包+对话历史存档进行反馈 ...'))
yield from update_ui(chatbot=chatbot, history=history); time.sleep(1) # 刷新界面
promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
# <-------------- we are done ------------->
return success
# ========================================= 插件主程序2 =====================================================
@CatchException
def Latex翻译中文并重新编译PDF(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
# <-------------- information about this plugin ------------->
chatbot.append([
"函数插件功能?",
"对整个Latex项目进行翻译, 生成中文PDF。函数插件贡献者: Binary-Husky。注意事项: 此插件Windows支持最佳,Linux下必须使用Docker安装,详见项目主README.md。目前仅支持GPT3.5/GPT4,其他模型转化效果未知。目前对机器学习类文献转化效果最好,其他类型文献转化效果未知。"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# <-------------- more requirements ------------->
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
more_req = plugin_kwargs.get("advanced_arg", "")
no_cache = more_req.startswith("--no-cache")
if no_cache: more_req.lstrip("--no-cache")
allow_cache = not no_cache
_switch_prompt_ = partial(switch_prompt, more_requirement=more_req)
# <-------------- check deps ------------->
try:
import glob, os, time, subprocess
subprocess.Popen(['pdflatex', '-version'])
from .latex_fns.latex_actions import Latex精细分解与转化, 编译Latex
except Exception as e:
chatbot.append([ f"解析项目: {txt}",
f"尝试执行Latex指令失败。Latex没有安装, 或者不在环境变量PATH中。安装方法https://tug.org/texlive/。报错信息\n\n```\n\n{trimmed_format_exc()}\n\n```\n\n"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# <-------------- clear history and read input ------------->
history = []
txt, arxiv_id = yield from arxiv_download(chatbot, history, txt, allow_cache)
if txt.endswith('.pdf'):
report_execption(chatbot, history, a = f"解析项目: {txt}", b = f"发现已经存在翻译好的PDF文档")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
if os.path.exists(txt):
project_folder = txt
else:
if txt == "": txt = '空空如也的输入栏'
report_execption(chatbot, history, a = f"解析项目: {txt}", b = f"找不到本地项目或无法处理: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
file_manifest = [f for f in glob.glob(f'{project_folder}/**/*.tex', recursive=True)]
if len(file_manifest) == 0:
report_execption(chatbot, history, a = f"解析项目: {txt}", b = f"找不到任何.tex文件: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# <-------------- if is a zip/tar file ------------->
project_folder = desend_to_extracted_folder_if_exist(project_folder)
# <-------------- move latex project away from temp folder ------------->
project_folder = move_project(project_folder, arxiv_id)
# <-------------- if merge_translate_zh is already generated, skip gpt req ------------->
if not os.path.exists(project_folder + '/merge_translate_zh.tex'):
yield from Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin_kwargs,
chatbot, history, system_prompt, mode='translate_zh', switch_prompt=_switch_prompt_)
# <-------------- compile PDF ------------->
success = yield from 编译Latex(chatbot, history, main_file_original='merge', main_file_modified='merge_translate_zh', mode='translate_zh',
work_folder_original=project_folder, work_folder_modified=project_folder, work_folder=project_folder)
# <-------------- zip PDF ------------->
zip_res = zip_result(project_folder)
if success:
chatbot.append((f"成功啦", '请查收结果(压缩包)...'))
yield from update_ui(chatbot=chatbot, history=history); time.sleep(1) # 刷新界面
promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
else:
chatbot.append((f"失败了", '虽然PDF生成失败了, 但请查收结果(压缩包), 内含已经翻译的Tex文档, 您可以到Github Issue区, 用该压缩包进行反馈。如系统是Linux,请检查系统字体见Github wiki ...'))
yield from update_ui(chatbot=chatbot, history=history); time.sleep(1) # 刷新界面
promote_file_to_downloadzone(file=zip_res, chatbot=chatbot)
# <-------------- we are done ------------->
return success

查看文件

@@ -0,0 +1,141 @@
from toolbox import CatchException, update_ui, promote_file_to_downloadzone
from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
import datetime, json
def fetch_items(list_of_items, batch_size):
for i in range(0, len(list_of_items), batch_size):
yield list_of_items[i:i + batch_size]
def string_to_options(arguments):
import argparse
import shlex
# Create an argparse.ArgumentParser instance
parser = argparse.ArgumentParser()
# Add command-line arguments
parser.add_argument("--llm_to_learn", type=str, help="LLM model to learn", default="gpt-3.5-turbo")
parser.add_argument("--prompt_prefix", type=str, help="Prompt prefix", default='')
parser.add_argument("--system_prompt", type=str, help="System prompt", default='')
parser.add_argument("--batch", type=int, help="System prompt", default=50)
parser.add_argument("--pre_seq_len", type=int, help="pre_seq_len", default=50)
parser.add_argument("--learning_rate", type=float, help="learning_rate", default=2e-2)
parser.add_argument("--num_gpus", type=int, help="num_gpus", default=1)
parser.add_argument("--json_dataset", type=str, help="json_dataset", default="")
parser.add_argument("--ptuning_directory", type=str, help="ptuning_directory", default="")
# Parse the arguments
args = parser.parse_args(shlex.split(arguments))
return args
@CatchException
def 微调数据集生成(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
"""
txt 输入栏用户输入的文本,例如需要翻译的一段话,再例如一个包含了待处理文件的路径
llm_kwargs gpt模型参数,如温度和top_p等,一般原样传递下去就行
plugin_kwargs 插件模型的参数
chatbot 聊天显示框的句柄,用于显示给用户
history 聊天历史,前情提要
system_prompt 给gpt的静默提醒
web_port 当前软件运行的端口号
"""
history = [] # 清空历史,以免输入溢出
chatbot.append(("这是什么功能?", "[Local Message] 微调数据集生成"))
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
args = plugin_kwargs.get("advanced_arg", None)
if args is None:
chatbot.append(("没给定指令", "退出"))
yield from update_ui(chatbot=chatbot, history=history); return
else:
arguments = string_to_options(arguments=args)
dat = []
with open(txt, 'r', encoding='utf8') as f:
for line in f.readlines():
json_dat = json.loads(line)
dat.append(json_dat["content"])
llm_kwargs['llm_model'] = arguments.llm_to_learn
for batch in fetch_items(dat, arguments.batch):
res = yield from request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
inputs_array=[f"{arguments.prompt_prefix}\n\n{b}" for b in (batch)],
inputs_show_user_array=[f"Show Nothing" for _ in (batch)],
llm_kwargs=llm_kwargs,
chatbot=chatbot,
history_array=[[] for _ in (batch)],
sys_prompt_array=[arguments.system_prompt for _ in (batch)],
max_workers=10 # OpenAI所允许的最大并行过载
)
with open(txt+'.generated.json', 'a+', encoding='utf8') as f:
for b, r in zip(batch, res[1::2]):
f.write(json.dumps({"content":b, "summary":r}, ensure_ascii=False)+'\n')
promote_file_to_downloadzone(txt+'.generated.json', rename_file='generated.json', chatbot=chatbot)
return
@CatchException
def 启动微调(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
"""
txt 输入栏用户输入的文本,例如需要翻译的一段话,再例如一个包含了待处理文件的路径
llm_kwargs gpt模型参数,如温度和top_p等,一般原样传递下去就行
plugin_kwargs 插件模型的参数
chatbot 聊天显示框的句柄,用于显示给用户
history 聊天历史,前情提要
system_prompt 给gpt的静默提醒
web_port 当前软件运行的端口号
"""
import subprocess
history = [] # 清空历史,以免输入溢出
chatbot.append(("这是什么功能?", "[Local Message] 微调数据集生成"))
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
args = plugin_kwargs.get("advanced_arg", None)
if args is None:
chatbot.append(("没给定指令", "退出"))
yield from update_ui(chatbot=chatbot, history=history); return
else:
arguments = string_to_options(arguments=args)
pre_seq_len = arguments.pre_seq_len # 128
learning_rate = arguments.learning_rate # 2e-2
num_gpus = arguments.num_gpus # 1
json_dataset = arguments.json_dataset # 't_code.json'
ptuning_directory = arguments.ptuning_directory # '/home/hmp/ChatGLM2-6B/ptuning'
command = f"torchrun --standalone --nnodes=1 --nproc-per-node={num_gpus} main.py \
--do_train \
--train_file AdvertiseGen/{json_dataset} \
--validation_file AdvertiseGen/{json_dataset} \
--preprocessing_num_workers 20 \
--prompt_column content \
--response_column summary \
--overwrite_cache \
--model_name_or_path THUDM/chatglm2-6b \
--output_dir output/clothgen-chatglm2-6b-pt-{pre_seq_len}-{learning_rate} \
--overwrite_output_dir \
--max_source_length 256 \
--max_target_length 256 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 16 \
--predict_with_generate \
--max_steps 100 \
--logging_steps 10 \
--save_steps 20 \
--learning_rate {learning_rate} \
--pre_seq_len {pre_seq_len} \
--quantization_bit 4"
process = subprocess.Popen(command, shell=True, cwd=ptuning_directory)
try:
process.communicate(timeout=3600*24)
except subprocess.TimeoutExpired:
process.kill()
return

查看文件

@@ -1,130 +0,0 @@
"""
这是什么?
这个文件用于函数插件的单元测试
运行方法 python crazy_functions/crazy_functions_test.py
"""
def validate_path():
import os, sys
dir_name = os.path.dirname(__file__)
root_dir_assume = os.path.abspath(os.path.dirname(__file__) + '/..')
os.chdir(root_dir_assume)
sys.path.append(root_dir_assume)
validate_path() # validate path so you can run from base directory
from colorful import *
from toolbox import get_conf, ChatBotWithCookies
proxies, WEB_PORT, LLM_MODEL, CONCURRENT_COUNT, AUTHENTICATION, CHATBOT_HEIGHT, LAYOUT, API_KEY = \
get_conf('proxies', 'WEB_PORT', 'LLM_MODEL', 'CONCURRENT_COUNT', 'AUTHENTICATION', 'CHATBOT_HEIGHT', 'LAYOUT', 'API_KEY')
llm_kwargs = {
'api_key': API_KEY,
'llm_model': LLM_MODEL,
'top_p':1.0,
'max_length': None,
'temperature':1.0,
}
plugin_kwargs = { }
chatbot = ChatBotWithCookies(llm_kwargs)
history = []
system_prompt = "Serve me as a writing and programming assistant."
web_port = 1024
def test_解析一个Python项目():
from crazy_functions.解析项目源代码 import 解析一个Python项目
txt = "crazy_functions/test_project/python/dqn"
for cookies, cb, hist, msg in 解析一个Python项目(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
print(cb)
def test_解析一个Cpp项目():
from crazy_functions.解析项目源代码 import 解析一个C项目
txt = "crazy_functions/test_project/cpp/cppipc"
for cookies, cb, hist, msg in 解析一个C项目(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
print(cb)
def test_Latex英文润色():
from crazy_functions.Latex全文润色 import Latex英文润色
txt = "crazy_functions/test_project/latex/attention"
for cookies, cb, hist, msg in Latex英文润色(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
print(cb)
def test_Markdown中译英():
from crazy_functions.批量Markdown翻译 import Markdown中译英
txt = "README.md"
for cookies, cb, hist, msg in Markdown中译英(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
print(cb)
def test_批量翻译PDF文档():
from crazy_functions.批量翻译PDF文档_多线程 import 批量翻译PDF文档
txt = "crazy_functions/test_project/pdf_and_word"
for cookies, cb, hist, msg in 批量翻译PDF文档(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
print(cb)
def test_谷歌检索小助手():
from crazy_functions.谷歌检索小助手 import 谷歌检索小助手
txt = "https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=auto+reinforcement+learning&btnG="
for cookies, cb, hist, msg in 谷歌检索小助手(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
print(cb)
def test_总结word文档():
from crazy_functions.总结word文档 import 总结word文档
txt = "crazy_functions/test_project/pdf_and_word"
for cookies, cb, hist, msg in 总结word文档(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
print(cb)
def test_下载arxiv论文并翻译摘要():
from crazy_functions.下载arxiv论文翻译摘要 import 下载arxiv论文并翻译摘要
txt = "1812.10695"
for cookies, cb, hist, msg in 下载arxiv论文并翻译摘要(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
print(cb)
def test_联网回答问题():
from crazy_functions.联网的ChatGPT import 连接网络回答问题
# txt = "“我们称之为高效”是什么梗?"
# >> 从第0份、第1份、第2份搜索结果可以看出,“我们称之为高效”是指在游戏社区中,用户们用来形容一些游戏策略或行为非常高效且能够带来好的效果的用语。这个用语最初可能是在群星Stellaris这个游戏里面流行起来的,后来也传播到了其他游戏中,比如巨像Titan等游戏。其中第1份搜索结果中的一篇文章也指出,“我们称之为高效”这 一用语来源于群星Stellaris游戏中的一个情节。
# txt = "为什么说枪毙P社玩家没有一个冤枉的?"
# >> 它们都是关于一个知乎用户所发的帖子,引用了一群游戏玩家对于需要对P社玩家进行枪毙的讨论,这个话题的本质是玩家们对于P 社游戏中的政治与历史元素的不同看法,以及其中不少玩家以极端立场宣扬的想法和言论,因此有人就以枪毙这些玩家来回应此类言论。但是这个话题本身并没有实质内容,只是一个玩笑或者恶搞,并不应该被当做真实的态度或者观点,因此这种说法没有实际意义。
# txt = "谁是应急食品?"
# >> '根据以上搜索结果可以得知,应急食品是“原神”游戏中的角色派蒙的外号。'
# txt = "道路千万条,安全第一条。后面两句是?"
# >> '行车不规范,亲人两行泪。'
# txt = "What is in the canister?"
# >> Rainbow Six Siege 游戏中 Smoke 的 Canister 中装有何种物质相关的官方信息。
# txt = "失败的man是什么?"
# >> 根据第1份搜索结果,可以得知失败的man是指一位在B站购买了蜘蛛侠COS服后穿上后被网友嘲笑的UP主,而“失败的man”是蜘蛛侠英文名“spiderman”的谐音梗,并且网友们还 给这位UP主起了“苍蝇侠”的外号。因此,失败的man是指这位UP主在穿上蜘蛛侠COS服后被网友嘲笑的情况。
# txt = "老六是什么,起源于哪里?"
# >> 老六是网络流行语,最初起源于游戏《CSGO》,指游戏中玩家中独来独往、游离于队伍之外的“自由人”或玩得比较菜或者玩得比较阴险的人 ,后来逐渐演变成指玩得比较阴险的玩家。
# txt = "罗小黑战记因为什么经常被吐槽?"
# >> 3. 更新速度。罗小黑战记的更新时间不定,时而快时而慢,给观众留下了等待的时间过长的印象。
# txt = "沙特、伊朗最近的关系如何?"
# >> 最近在中国的斡旋下,沙特和伊朗于3月10日达成了恢复两国外交关系的协议,这表明两国关系已经重新回到正常化状态。
# txt = "You should have gone for the head. What does that mean?"
# >> The phrase "You should have gone for the head" is a quote from the Marvel movies, Avengers: Infinity War and Avengers: Endgame. It was spoken by the character Thanos in Infinity War and by Thor in Endgame.
txt = "AutoGPT是什么?"
# >> AutoGPT是一个基于GPT-4语言模型的开源应用程序。它可以根据用户需求自主执行任务,包括事件分析、营销方案撰写、代码编程、数学运算等等,并完全不需要用户插手。它可以自己思考,给出实现的步骤和实现细节,甚至可以自问自答执 行任务。最近它在GitHub上爆火,成为了业内最热门的项目之一。
# txt = "钟离带什么圣遗物?"
for cookies, cb, hist, msg in 连接网络回答问题(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
print("当前问答:", cb[-1][-1].replace("\n"," "))
for i, it in enumerate(cb): print亮蓝(it[0]); print亮黄(it[1])
def test_解析ipynb文件():
from crazy_functions.解析JupyterNotebook import 解析ipynb文件
txt = "crazy_functions/test_samples"
for cookies, cb, hist, msg in 解析ipynb文件(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
print(cb)
# test_解析一个Python项目()
# test_Latex英文润色()
# test_Markdown中译英()
# test_批量翻译PDF文档()
# test_谷歌检索小助手()
# test_总结word文档()
# test_下载arxiv论文并翻译摘要()
# test_解析一个Cpp项目()
# test_联网回答问题()
test_解析ipynb文件()
input("程序完成,回车退出。")
print("退出。")

查看文件

@@ -1,4 +1,7 @@
from toolbox import update_ui, get_conf, trimmed_format_exc
from toolbox import update_ui, get_conf, trimmed_format_exc, get_log_folder
import threading
import os
import logging
def input_clipping(inputs, history, max_token_limit):
import numpy as np
@@ -129,6 +132,11 @@ def request_gpt_model_in_new_thread_with_ui_alive(
yield from update_ui(chatbot=chatbot, history=[]) # 如果最后成功了,则删除报错信息
return final_result
def can_multi_process(llm):
if llm.startswith('gpt-'): return True
if llm.startswith('api2d-'): return True
if llm.startswith('azure-'): return True
return False
def request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
inputs_array, inputs_show_user_array, llm_kwargs,
@@ -174,7 +182,7 @@ def request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
except: max_workers = 8
if max_workers <= 0: max_workers = 3
# 屏蔽掉 chatglm的多线程,可能会导致严重卡顿
if not (llm_kwargs['llm_model'].startswith('gpt-') or llm_kwargs['llm_model'].startswith('api2d-')):
if not can_multi_process(llm_kwargs['llm_model']):
max_workers = 1
executor = ThreadPoolExecutor(max_workers=max_workers)
@@ -259,9 +267,6 @@ def request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
time.sleep(refresh_interval)
cnt += 1
worker_done = [h.done() for h in futures]
if all(worker_done):
executor.shutdown()
break
# 更好的UI视觉效果
observe_win = []
# 每个线程都要“喂狗”(看门狗)
@@ -280,7 +285,10 @@ def request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
# 在前端打印些好玩的东西
chatbot[-1] = [chatbot[-1][0], f'多线程操作已经开始,完成情况: \n\n{stat_str}' + ''.join(['.']*(cnt % 10+1))]
yield from update_ui(chatbot=chatbot, history=[]) # 刷新界面
if all(worker_done):
executor.shutdown()
break
# 异步任务结束
gpt_response_collection = []
for inputs_show_user, f in zip(inputs_show_user_array, futures):
@@ -463,14 +471,16 @@ def read_and_clean_pdf_text(fp):
'- ', '') for t in text_areas['blocks'] if 'lines' in t]
############################## <第 2 步,获取正文主字体> ##################################
fsize_statiscs = {}
for span in meta_span:
if span[1] not in fsize_statiscs: fsize_statiscs[span[1]] = 0
fsize_statiscs[span[1]] += span[2]
main_fsize = max(fsize_statiscs, key=fsize_statiscs.get)
if REMOVE_FOOT_NOTE:
give_up_fize_threshold = main_fsize * REMOVE_FOOT_FFSIZE_PERCENT
try:
fsize_statiscs = {}
for span in meta_span:
if span[1] not in fsize_statiscs: fsize_statiscs[span[1]] = 0
fsize_statiscs[span[1]] += span[2]
main_fsize = max(fsize_statiscs, key=fsize_statiscs.get)
if REMOVE_FOOT_NOTE:
give_up_fize_threshold = main_fsize * REMOVE_FOOT_FFSIZE_PERCENT
except:
raise RuntimeError(f'抱歉, 我们暂时无法解析此PDF文档: {fp}')
############################## <第 3 步,切分和重新整合> ##################################
mega_sec = []
sec = []
@@ -585,11 +595,16 @@ def get_files_from_everything(txt, type): # type='.md'
# 网络的远程文件
import requests
from toolbox import get_conf
from toolbox import get_log_folder, gen_time_str
proxies, = get_conf('proxies')
r = requests.get(txt, proxies=proxies)
with open('./gpt_log/temp'+type, 'wb+') as f: f.write(r.content)
project_folder = './gpt_log/'
file_manifest = ['./gpt_log/temp'+type]
try:
r = requests.get(txt, proxies=proxies)
except:
raise ConnectionRefusedError(f"无法下载资源{txt},请检查。")
path = os.path.join(get_log_folder(plugin_name='web_download'), gen_time_str()+type)
with open(path, 'wb+') as f: f.write(r.content)
project_folder = get_log_folder(plugin_name='web_download')
file_manifest = [path]
elif txt.endswith(type):
# 直接给定文件
file_manifest = [txt]
@@ -606,3 +621,196 @@ def get_files_from_everything(txt, type): # type='.md'
success = False
return success, file_manifest, project_folder
def Singleton(cls):
_instance = {}
def _singleton(*args, **kargs):
if cls not in _instance:
_instance[cls] = cls(*args, **kargs)
return _instance[cls]
return _singleton
@Singleton
class knowledge_archive_interface():
def __init__(self) -> None:
self.threadLock = threading.Lock()
self.current_id = ""
self.kai_path = None
self.qa_handle = None
self.text2vec_large_chinese = None
def get_chinese_text2vec(self):
if self.text2vec_large_chinese is None:
# < -------------------预热文本向量化模组--------------- >
from toolbox import ProxyNetworkActivate
print('Checking Text2vec ...')
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
with ProxyNetworkActivate('Download_LLM'): # 临时地激活代理网络
self.text2vec_large_chinese = HuggingFaceEmbeddings(model_name="GanymedeNil/text2vec-large-chinese")
return self.text2vec_large_chinese
def feed_archive(self, file_manifest, id="default"):
self.threadLock.acquire()
# import uuid
self.current_id = id
from zh_langchain import construct_vector_store
self.qa_handle, self.kai_path = construct_vector_store(
vs_id=self.current_id,
files=file_manifest,
sentence_size=100,
history=[],
one_conent="",
one_content_segmentation="",
text2vec = self.get_chinese_text2vec(),
)
self.threadLock.release()
def get_current_archive_id(self):
return self.current_id
def get_loaded_file(self):
return self.qa_handle.get_loaded_file()
def answer_with_archive_by_id(self, txt, id):
self.threadLock.acquire()
if not self.current_id == id:
self.current_id = id
from zh_langchain import construct_vector_store
self.qa_handle, self.kai_path = construct_vector_store(
vs_id=self.current_id,
files=[],
sentence_size=100,
history=[],
one_conent="",
one_content_segmentation="",
text2vec = self.get_chinese_text2vec(),
)
VECTOR_SEARCH_SCORE_THRESHOLD = 0
VECTOR_SEARCH_TOP_K = 4
CHUNK_SIZE = 512
resp, prompt = self.qa_handle.get_knowledge_based_conent_test(
query = txt,
vs_path = self.kai_path,
score_threshold=VECTOR_SEARCH_SCORE_THRESHOLD,
vector_search_top_k=VECTOR_SEARCH_TOP_K,
chunk_conent=True,
chunk_size=CHUNK_SIZE,
text2vec = self.get_chinese_text2vec(),
)
self.threadLock.release()
return resp, prompt
@Singleton
class nougat_interface():
def __init__(self):
self.threadLock = threading.Lock()
def nougat_with_timeout(self, command, cwd, timeout=3600):
import subprocess
logging.info(f'正在执行命令 {command}')
process = subprocess.Popen(command, shell=True, cwd=cwd)
try:
stdout, stderr = process.communicate(timeout=timeout)
except subprocess.TimeoutExpired:
process.kill()
stdout, stderr = process.communicate()
print("Process timed out!")
return False
return True
def NOUGAT_parse_pdf(self, fp, chatbot, history):
from toolbox import update_ui_lastest_msg
yield from update_ui_lastest_msg("正在解析论文, 请稍候。进度:正在排队, 等待线程锁...",
chatbot=chatbot, history=history, delay=0)
self.threadLock.acquire()
import glob, threading, os
from toolbox import get_log_folder, gen_time_str
dst = os.path.join(get_log_folder(plugin_name='nougat'), gen_time_str())
os.makedirs(dst)
yield from update_ui_lastest_msg("正在解析论文, 请稍候。进度正在加载NOUGAT... 提示首次运行需要花费较长时间下载NOUGAT参数",
chatbot=chatbot, history=history, delay=0)
self.nougat_with_timeout(f'nougat --out "{os.path.abspath(dst)}" "{os.path.abspath(fp)}"', os.getcwd(), timeout=3600)
res = glob.glob(os.path.join(dst,'*.mmd'))
if len(res) == 0:
self.threadLock.release()
raise RuntimeError("Nougat解析论文失败。")
self.threadLock.release()
return res[0]
def try_install_deps(deps, reload_m=[]):
import subprocess, sys, importlib
for dep in deps:
subprocess.check_call([sys.executable, '-m', 'pip', 'install', '--user', dep])
import site
importlib.reload(site)
for m in reload_m:
importlib.reload(__import__(m))
HTML_CSS = """
.row {
display: flex;
flex-wrap: wrap;
}
.column {
flex: 1;
padding: 10px;
}
.table-header {
font-weight: bold;
border-bottom: 1px solid black;
}
.table-row {
border-bottom: 1px solid lightgray;
}
.table-cell {
padding: 5px;
}
"""
TABLE_CSS = """
<div class="row table-row">
<div class="column table-cell">REPLACE_A</div>
<div class="column table-cell">REPLACE_B</div>
</div>
"""
class construct_html():
def __init__(self) -> None:
self.css = HTML_CSS
self.html_string = f'<!DOCTYPE html><head><meta charset="utf-8"><title>翻译结果</title><style>{self.css}</style></head>'
def add_row(self, a, b):
tmp = TABLE_CSS
from toolbox import markdown_convertion
tmp = tmp.replace('REPLACE_A', markdown_convertion(a))
tmp = tmp.replace('REPLACE_B', markdown_convertion(b))
self.html_string += tmp
def save_file(self, file_name):
with open(os.path.join(get_log_folder(), file_name), 'w', encoding='utf8') as f:
f.write(self.html_string.encode('utf-8', 'ignore').decode())
return os.path.join(get_log_folder(), file_name)
def get_plugin_arg(plugin_kwargs, key, default):
# 如果参数是空的
if (key in plugin_kwargs) and (plugin_kwargs[key] == ""): plugin_kwargs.pop(key)
# 正常情况
return plugin_kwargs.get(key, default)

查看文件

@@ -0,0 +1,70 @@
import time
import importlib
from toolbox import trimmed_format_exc, gen_time_str, get_log_folder
from toolbox import CatchException, update_ui, gen_time_str, trimmed_format_exc, is_the_upload_folder
from toolbox import promote_file_to_downloadzone, get_log_folder, update_ui_lastest_msg
import multiprocessing
def get_class_name(class_string):
import re
# Use regex to extract the class name
class_name = re.search(r'class (\w+)\(', class_string).group(1)
return class_name
def try_make_module(code, chatbot):
module_file = 'gpt_fn_' + gen_time_str().replace('-','_')
fn_path = f'{get_log_folder(plugin_name="gen_plugin_verify")}/{module_file}.py'
with open(fn_path, 'w', encoding='utf8') as f: f.write(code)
promote_file_to_downloadzone(fn_path, chatbot=chatbot)
class_name = get_class_name(code)
manager = multiprocessing.Manager()
return_dict = manager.dict()
p = multiprocessing.Process(target=is_function_successfully_generated, args=(fn_path, class_name, return_dict))
# only has 10 seconds to run
p.start(); p.join(timeout=10)
if p.is_alive(): p.terminate(); p.join()
p.close()
return return_dict["success"], return_dict['traceback']
# check is_function_successfully_generated
def is_function_successfully_generated(fn_path, class_name, return_dict):
return_dict['success'] = False
return_dict['traceback'] = ""
try:
# Create a spec for the module
module_spec = importlib.util.spec_from_file_location('example_module', fn_path)
# Load the module
example_module = importlib.util.module_from_spec(module_spec)
module_spec.loader.exec_module(example_module)
# Now you can use the module
some_class = getattr(example_module, class_name)
# Now you can create an instance of the class
instance = some_class()
return_dict['success'] = True
return
except:
return_dict['traceback'] = trimmed_format_exc()
return
def subprocess_worker(code, file_path, return_dict):
return_dict['result'] = None
return_dict['success'] = False
return_dict['traceback'] = ""
try:
module_file = 'gpt_fn_' + gen_time_str().replace('-','_')
fn_path = f'{get_log_folder(plugin_name="gen_plugin_run")}/{module_file}.py'
with open(fn_path, 'w', encoding='utf8') as f: f.write(code)
class_name = get_class_name(code)
# Create a spec for the module
module_spec = importlib.util.spec_from_file_location('example_module', fn_path)
# Load the module
example_module = importlib.util.module_from_spec(module_spec)
module_spec.loader.exec_module(example_module)
# Now you can use the module
some_class = getattr(example_module, class_name)
# Now you can create an instance of the class
instance = some_class()
return_dict['result'] = instance.run(file_path)
return_dict['success'] = True
except:
return_dict['traceback'] = trimmed_format_exc()

查看文件

@@ -0,0 +1,111 @@
"""
https://github.com/langchain-ai/langchain/blob/master/docs/extras/modules/model_io/output_parsers/pydantic.ipynb
Example 1.
# Define your desired data structure.
class Joke(BaseModel):
setup: str = Field(description="question to set up a joke")
punchline: str = Field(description="answer to resolve the joke")
# You can add custom validation logic easily with Pydantic.
@validator("setup")
def question_ends_with_question_mark(cls, field):
if field[-1] != "?":
raise ValueError("Badly formed question!")
return field
Example 2.
# Here's another example, but with a compound typed field.
class Actor(BaseModel):
name: str = Field(description="name of an actor")
film_names: List[str] = Field(description="list of names of films they starred in")
"""
import json, re, logging
PYDANTIC_FORMAT_INSTRUCTIONS = """The output should be formatted as a JSON instance that conforms to the JSON schema below.
As an example, for the schema {{"properties": {{"foo": {{"title": "Foo", "description": "a list of strings", "type": "array", "items": {{"type": "string"}}}}}}, "required": ["foo"]}}
the object {{"foo": ["bar", "baz"]}} is a well-formatted instance of the schema. The object {{"properties": {{"foo": ["bar", "baz"]}}}} is not well-formatted.
Here is the output schema:
```
{schema}
```"""
PYDANTIC_FORMAT_INSTRUCTIONS_SIMPLE = """The output should be formatted as a JSON instance that conforms to the JSON schema below.
```
{schema}
```"""
class JsonStringError(Exception): ...
class GptJsonIO():
def __init__(self, schema, example_instruction=True):
self.pydantic_object = schema
self.example_instruction = example_instruction
self.format_instructions = self.generate_format_instructions()
def generate_format_instructions(self):
schema = self.pydantic_object.schema()
# Remove extraneous fields.
reduced_schema = schema
if "title" in reduced_schema:
del reduced_schema["title"]
if "type" in reduced_schema:
del reduced_schema["type"]
# Ensure json in context is well-formed with double quotes.
if self.example_instruction:
schema_str = json.dumps(reduced_schema)
return PYDANTIC_FORMAT_INSTRUCTIONS.format(schema=schema_str)
else:
return PYDANTIC_FORMAT_INSTRUCTIONS_SIMPLE.format(schema=schema_str)
def generate_output(self, text):
# Greedy search for 1st json candidate.
match = re.search(
r"\{.*\}", text.strip(), re.MULTILINE | re.IGNORECASE | re.DOTALL
)
json_str = ""
if match: json_str = match.group()
json_object = json.loads(json_str, strict=False)
final_object = self.pydantic_object.parse_obj(json_object)
return final_object
def generate_repair_prompt(self, broken_json, error):
prompt = "Fix a broken json string.\n\n" + \
"(1) The broken json string need to fix is: \n\n" + \
"```" + "\n" + \
broken_json + "\n" + \
"```" + "\n\n" + \
"(2) The error message is: \n\n" + \
error + "\n\n" + \
"Now, fix this json string. \n\n"
return prompt
def generate_output_auto_repair(self, response, gpt_gen_fn):
"""
response: string containing canidate json
gpt_gen_fn: gpt_gen_fn(inputs, sys_prompt)
"""
try:
result = self.generate_output(response)
except Exception as e:
try:
logging.info(f'Repairing json{response}')
repair_prompt = self.generate_repair_prompt(broken_json = response, error=repr(e))
result = self.generate_output(gpt_gen_fn(repair_prompt, self.format_instructions))
logging.info('Repaire json success.')
except Exception as e:
# 没辙了,放弃治疗
logging.info('Repaire json fail.')
raise JsonStringError('Cannot repair json.', str(e))
return result

查看文件

@@ -0,0 +1,447 @@
from toolbox import update_ui, update_ui_lastest_msg, get_log_folder
from toolbox import zip_folder, objdump, objload, promote_file_to_downloadzone
from .latex_toolbox import PRESERVE, TRANSFORM
from .latex_toolbox import set_forbidden_text, set_forbidden_text_begin_end, set_forbidden_text_careful_brace
from .latex_toolbox import reverse_forbidden_text_careful_brace, reverse_forbidden_text, convert_to_linklist, post_process
from .latex_toolbox import fix_content, find_main_tex_file, merge_tex_files, compile_latex_with_timeout
import os, shutil
import re
import numpy as np
pj = os.path.join
def split_subprocess(txt, project_folder, return_dict, opts):
"""
break down latex file to a linked list,
each node use a preserve flag to indicate whether it should
be proccessed by GPT.
"""
text = txt
mask = np.zeros(len(txt), dtype=np.uint8) + TRANSFORM
# 吸收title与作者以上的部分
text, mask = set_forbidden_text(text, mask, r"^(.*?)\\maketitle", re.DOTALL)
text, mask = set_forbidden_text(text, mask, r"^(.*?)\\begin{document}", re.DOTALL)
# 吸收iffalse注释
text, mask = set_forbidden_text(text, mask, r"\\iffalse(.*?)\\fi", re.DOTALL)
# 吸收在42行以内的begin-end组合
text, mask = set_forbidden_text_begin_end(text, mask, r"\\begin\{([a-z\*]*)\}(.*?)\\end\{\1\}", re.DOTALL, limit_n_lines=42)
# 吸收匿名公式
text, mask = set_forbidden_text(text, mask, [ r"\$\$([^$]+)\$\$", r"\\\[.*?\\\]" ], re.DOTALL)
# 吸收其他杂项
text, mask = set_forbidden_text(text, mask, [ r"\\section\{(.*?)\}", r"\\section\*\{(.*?)\}", r"\\subsection\{(.*?)\}", r"\\subsubsection\{(.*?)\}" ])
text, mask = set_forbidden_text(text, mask, [ r"\\bibliography\{(.*?)\}", r"\\bibliographystyle\{(.*?)\}" ])
text, mask = set_forbidden_text(text, mask, r"\\begin\{thebibliography\}.*?\\end\{thebibliography\}", re.DOTALL)
text, mask = set_forbidden_text(text, mask, r"\\begin\{lstlisting\}(.*?)\\end\{lstlisting\}", re.DOTALL)
text, mask = set_forbidden_text(text, mask, r"\\begin\{wraptable\}(.*?)\\end\{wraptable\}", re.DOTALL)
text, mask = set_forbidden_text(text, mask, r"\\begin\{algorithm\}(.*?)\\end\{algorithm\}", re.DOTALL)
text, mask = set_forbidden_text(text, mask, [r"\\begin\{wrapfigure\}(.*?)\\end\{wrapfigure\}", r"\\begin\{wrapfigure\*\}(.*?)\\end\{wrapfigure\*\}"], re.DOTALL)
text, mask = set_forbidden_text(text, mask, [r"\\begin\{figure\}(.*?)\\end\{figure\}", r"\\begin\{figure\*\}(.*?)\\end\{figure\*\}"], re.DOTALL)
text, mask = set_forbidden_text(text, mask, [r"\\begin\{multline\}(.*?)\\end\{multline\}", r"\\begin\{multline\*\}(.*?)\\end\{multline\*\}"], re.DOTALL)
text, mask = set_forbidden_text(text, mask, [r"\\begin\{table\}(.*?)\\end\{table\}", r"\\begin\{table\*\}(.*?)\\end\{table\*\}"], re.DOTALL)
text, mask = set_forbidden_text(text, mask, [r"\\begin\{minipage\}(.*?)\\end\{minipage\}", r"\\begin\{minipage\*\}(.*?)\\end\{minipage\*\}"], re.DOTALL)
text, mask = set_forbidden_text(text, mask, [r"\\begin\{align\*\}(.*?)\\end\{align\*\}", r"\\begin\{align\}(.*?)\\end\{align\}"], re.DOTALL)
text, mask = set_forbidden_text(text, mask, [r"\\begin\{equation\}(.*?)\\end\{equation\}", r"\\begin\{equation\*\}(.*?)\\end\{equation\*\}"], re.DOTALL)
text, mask = set_forbidden_text(text, mask, [r"\\includepdf\[(.*?)\]\{(.*?)\}", r"\\clearpage", r"\\newpage", r"\\appendix", r"\\tableofcontents", r"\\include\{(.*?)\}"])
text, mask = set_forbidden_text(text, mask, [r"\\vspace\{(.*?)\}", r"\\hspace\{(.*?)\}", r"\\label\{(.*?)\}", r"\\begin\{(.*?)\}", r"\\end\{(.*?)\}", r"\\item "])
text, mask = set_forbidden_text_careful_brace(text, mask, r"\\hl\{(.*?)\}", re.DOTALL)
# reverse 操作必须放在最后
text, mask = reverse_forbidden_text_careful_brace(text, mask, r"\\caption\{(.*?)\}", re.DOTALL, forbid_wrapper=True)
text, mask = reverse_forbidden_text_careful_brace(text, mask, r"\\abstract\{(.*?)\}", re.DOTALL, forbid_wrapper=True)
text, mask = reverse_forbidden_text(text, mask, r"\\begin\{abstract\}(.*?)\\end\{abstract\}", re.DOTALL, forbid_wrapper=True)
root = convert_to_linklist(text, mask)
# 最后一步处理,增强稳健性
root = post_process(root)
# 输出html调试文件,用红色标注处保留区PRESERVE,用黑色标注转换区TRANSFORM
with open(pj(project_folder, 'debug_log.html'), 'w', encoding='utf8') as f:
segment_parts_for_gpt = []
nodes = []
node = root
while True:
nodes.append(node)
show_html = node.string.replace('\n','<br/>')
if not node.preserve:
segment_parts_for_gpt.append(node.string)
f.write(f'<p style="color:black;">#{node.range}{show_html}#</p>')
else:
f.write(f'<p style="color:red;">{show_html}</p>')
node = node.next
if node is None: break
for n in nodes: n.next = None # break
return_dict['nodes'] = nodes
return_dict['segment_parts_for_gpt'] = segment_parts_for_gpt
return return_dict
class LatexPaperSplit():
"""
break down latex file to a linked list,
each node use a preserve flag to indicate whether it should
be proccessed by GPT.
"""
def __init__(self) -> None:
self.nodes = None
self.msg = "*{\\scriptsize\\textbf{警告该PDF由GPT-Academic开源项目调用大语言模型+Latex翻译插件一键生成," + \
"版权归原文作者所有。翻译内容可靠性无保障,请仔细鉴别并以原文为准。" + \
"项目Github地址 \\url{https://github.com/binary-husky/gpt_academic/}。"
# 请您不要删除或修改这行警告,除非您是论文的原作者如果您是论文原作者,欢迎加REAME中的QQ联系开发者
self.msg_declare = "为了防止大语言模型的意外谬误产生扩散影响,禁止移除或修改此警告。}}\\\\"
def merge_result(self, arr, mode, msg, buggy_lines=[], buggy_line_surgery_n_lines=10):
"""
Merge the result after the GPT process completed
"""
result_string = ""
node_cnt = 0
line_cnt = 0
for node in self.nodes:
if node.preserve:
line_cnt += node.string.count('\n')
result_string += node.string
else:
translated_txt = fix_content(arr[node_cnt], node.string)
begin_line = line_cnt
end_line = line_cnt + translated_txt.count('\n')
# reverse translation if any error
if any([begin_line-buggy_line_surgery_n_lines <= b_line <= end_line+buggy_line_surgery_n_lines for b_line in buggy_lines]):
translated_txt = node.string
result_string += translated_txt
node_cnt += 1
line_cnt += translated_txt.count('\n')
if mode == 'translate_zh':
pattern = re.compile(r'\\begin\{abstract\}.*\n')
match = pattern.search(result_string)
if not match:
# match \abstract{xxxx}
pattern_compile = re.compile(r"\\abstract\{(.*?)\}", flags=re.DOTALL)
match = pattern_compile.search(result_string)
position = match.regs[1][0]
else:
# match \begin{abstract}xxxx\end{abstract}
position = match.end()
result_string = result_string[:position] + self.msg + msg + self.msg_declare + result_string[position:]
return result_string
def split(self, txt, project_folder, opts):
"""
break down latex file to a linked list,
each node use a preserve flag to indicate whether it should
be proccessed by GPT.
P.S. use multiprocessing to avoid timeout error
"""
import multiprocessing
manager = multiprocessing.Manager()
return_dict = manager.dict()
p = multiprocessing.Process(
target=split_subprocess,
args=(txt, project_folder, return_dict, opts))
p.start()
p.join()
p.close()
self.nodes = return_dict['nodes']
self.sp = return_dict['segment_parts_for_gpt']
return self.sp
class LatexPaperFileGroup():
"""
use tokenizer to break down text according to max_token_limit
"""
def __init__(self):
self.file_paths = []
self.file_contents = []
self.sp_file_contents = []
self.sp_file_index = []
self.sp_file_tag = []
# count_token
from request_llm.bridge_all import model_info
enc = model_info["gpt-3.5-turbo"]['tokenizer']
def get_token_num(txt): return len(enc.encode(txt, disallowed_special=()))
self.get_token_num = get_token_num
def run_file_split(self, max_token_limit=1900):
"""
use tokenizer to break down text according to max_token_limit
"""
for index, file_content in enumerate(self.file_contents):
if self.get_token_num(file_content) < max_token_limit:
self.sp_file_contents.append(file_content)
self.sp_file_index.append(index)
self.sp_file_tag.append(self.file_paths[index])
else:
from ..crazy_utils import breakdown_txt_to_satisfy_token_limit_for_pdf
segments = breakdown_txt_to_satisfy_token_limit_for_pdf(file_content, self.get_token_num, max_token_limit)
for j, segment in enumerate(segments):
self.sp_file_contents.append(segment)
self.sp_file_index.append(index)
self.sp_file_tag.append(self.file_paths[index] + f".part-{j}.tex")
print('Segmentation: done')
def merge_result(self):
self.file_result = ["" for _ in range(len(self.file_paths))]
for r, k in zip(self.sp_file_result, self.sp_file_index):
self.file_result[k] += r
def write_result(self):
manifest = []
for path, res in zip(self.file_paths, self.file_result):
with open(path + '.polish.tex', 'w', encoding='utf8') as f:
manifest.append(path + '.polish.tex')
f.write(res)
return manifest
def Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, mode='proofread', switch_prompt=None, opts=[]):
import time, os, re
from ..crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
from .latex_actions import LatexPaperFileGroup, LatexPaperSplit
# <-------- 寻找主tex文件 ---------->
maintex = find_main_tex_file(file_manifest, mode)
chatbot.append((f"定位主Latex文件", f'[Local Message] 分析结果该项目的Latex主文件是{maintex}, 如果分析错误, 请立即终止程序, 删除或修改歧义文件, 然后重试。主程序即将开始, 请稍候。'))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
time.sleep(3)
# <-------- 读取Latex文件, 将多文件tex工程融合为一个巨型tex ---------->
main_tex_basename = os.path.basename(maintex)
assert main_tex_basename.endswith('.tex')
main_tex_basename_bare = main_tex_basename[:-4]
may_exist_bbl = pj(project_folder, f'{main_tex_basename_bare}.bbl')
if os.path.exists(may_exist_bbl):
shutil.copyfile(may_exist_bbl, pj(project_folder, f'merge.bbl'))
shutil.copyfile(may_exist_bbl, pj(project_folder, f'merge_{mode}.bbl'))
shutil.copyfile(may_exist_bbl, pj(project_folder, f'merge_diff.bbl'))
with open(maintex, 'r', encoding='utf-8', errors='replace') as f:
content = f.read()
merged_content = merge_tex_files(project_folder, content, mode)
with open(project_folder + '/merge.tex', 'w', encoding='utf-8', errors='replace') as f:
f.write(merged_content)
# <-------- 精细切分latex文件 ---------->
chatbot.append((f"Latex文件融合完成", f'[Local Message] 正在精细切分latex文件,这需要一段时间计算,文档越长耗时越长,请耐心等待。'))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
lps = LatexPaperSplit()
res = lps.split(merged_content, project_folder, opts) # 消耗时间的函数
# <-------- 拆分过长的latex片段 ---------->
pfg = LatexPaperFileGroup()
for index, r in enumerate(res):
pfg.file_paths.append('segment-' + str(index))
pfg.file_contents.append(r)
pfg.run_file_split(max_token_limit=1024)
n_split = len(pfg.sp_file_contents)
# <-------- 根据需要切换prompt ---------->
inputs_array, sys_prompt_array = switch_prompt(pfg, mode)
inputs_show_user_array = [f"{mode} {f}" for f in pfg.sp_file_tag]
if os.path.exists(pj(project_folder,'temp.pkl')):
# <-------- 【仅调试】如果存在调试缓存文件,则跳过GPT请求环节 ---------->
pfg = objload(file=pj(project_folder,'temp.pkl'))
else:
# <-------- gpt 多线程请求 ---------->
gpt_response_collection = yield from request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
inputs_array=inputs_array,
inputs_show_user_array=inputs_show_user_array,
llm_kwargs=llm_kwargs,
chatbot=chatbot,
history_array=[[""] for _ in range(n_split)],
sys_prompt_array=sys_prompt_array,
# max_workers=5, # 并行任务数量限制, 最多同时执行5个, 其他的排队等待
scroller_max_len = 40
)
# <-------- 文本碎片重组为完整的tex片段 ---------->
pfg.sp_file_result = []
for i_say, gpt_say, orig_content in zip(gpt_response_collection[0::2], gpt_response_collection[1::2], pfg.sp_file_contents):
pfg.sp_file_result.append(gpt_say)
pfg.merge_result()
# <-------- 临时存储用于调试 ---------->
pfg.get_token_num = None
objdump(pfg, file=pj(project_folder,'temp.pkl'))
write_html(pfg.sp_file_contents, pfg.sp_file_result, chatbot=chatbot, project_folder=project_folder)
# <-------- 写出文件 ---------->
msg = f"当前大语言模型: {llm_kwargs['llm_model']},当前语言模型温度设定: {llm_kwargs['temperature']}"
final_tex = lps.merge_result(pfg.file_result, mode, msg)
objdump((lps, pfg.file_result, mode, msg), file=pj(project_folder,'merge_result.pkl'))
with open(project_folder + f'/merge_{mode}.tex', 'w', encoding='utf-8', errors='replace') as f:
if mode != 'translate_zh' or "binary" in final_tex: f.write(final_tex)
# <-------- 整理结果, 退出 ---------->
chatbot.append((f"完成了吗?", 'GPT结果已输出, 即将编译PDF'))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# <-------- 返回 ---------->
return project_folder + f'/merge_{mode}.tex'
def remove_buggy_lines(file_path, log_path, tex_name, tex_name_pure, n_fix, work_folder_modified, fixed_line=[]):
try:
with open(log_path, 'r', encoding='utf-8', errors='replace') as f:
log = f.read()
import re
buggy_lines = re.findall(tex_name+':([0-9]{1,5}):', log)
buggy_lines = [int(l) for l in buggy_lines]
buggy_lines = sorted(buggy_lines)
buggy_line = buggy_lines[0]-1
print("reversing tex line that has errors", buggy_line)
# 重组,逆转出错的段落
if buggy_line not in fixed_line:
fixed_line.append(buggy_line)
lps, file_result, mode, msg = objload(file=pj(work_folder_modified,'merge_result.pkl'))
final_tex = lps.merge_result(file_result, mode, msg, buggy_lines=fixed_line, buggy_line_surgery_n_lines=5*n_fix)
with open(pj(work_folder_modified, f"{tex_name_pure}_fix_{n_fix}.tex"), 'w', encoding='utf-8', errors='replace') as f:
f.write(final_tex)
return True, f"{tex_name_pure}_fix_{n_fix}", buggy_lines
except:
print("Fatal error occurred, but we cannot identify error, please download zip, read latex log, and compile manually.")
return False, -1, [-1]
def 编译Latex(chatbot, history, main_file_original, main_file_modified, work_folder_original, work_folder_modified, work_folder, mode='default'):
import os, time
n_fix = 1
fixed_line = []
max_try = 32
chatbot.append([f"正在编译PDF文档", f'编译已经开始。当前工作路径为{work_folder},如果程序停顿5分钟以上,请直接去该路径下取回翻译结果,或者重启之后再度尝试 ...']); yield from update_ui(chatbot=chatbot, history=history)
chatbot.append([f"正在编译PDF文档", '...']); yield from update_ui(chatbot=chatbot, history=history); time.sleep(1); chatbot[-1] = list(chatbot[-1]) # 刷新界面
yield from update_ui_lastest_msg('编译已经开始...', chatbot, history) # 刷新Gradio前端界面
while True:
import os
may_exist_bbl = pj(work_folder_modified, f'merge.bbl')
target_bbl = pj(work_folder_modified, f'{main_file_modified}.bbl')
if os.path.exists(may_exist_bbl) and not os.path.exists(target_bbl):
shutil.copyfile(may_exist_bbl, target_bbl)
# https://stackoverflow.com/questions/738755/dont-make-me-manually-abort-a-latex-compile-when-theres-an-error
yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 编译原始PDF ...', chatbot, history) # 刷新Gradio前端界面
ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_original}.tex', work_folder_original)
yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 编译转化后的PDF ...', chatbot, history) # 刷新Gradio前端界面
ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_modified}.tex', work_folder_modified)
if ok and os.path.exists(pj(work_folder_modified, f'{main_file_modified}.pdf')):
# 只有第二步成功,才能继续下面的步骤
yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 编译BibTex ...', chatbot, history) # 刷新Gradio前端界面
if not os.path.exists(pj(work_folder_original, f'{main_file_original}.bbl')):
ok = compile_latex_with_timeout(f'bibtex {main_file_original}.aux', work_folder_original)
if not os.path.exists(pj(work_folder_modified, f'{main_file_modified}.bbl')):
ok = compile_latex_with_timeout(f'bibtex {main_file_modified}.aux', work_folder_modified)
yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 编译文献交叉引用 ...', chatbot, history) # 刷新Gradio前端界面
ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_original}.tex', work_folder_original)
ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_modified}.tex', work_folder_modified)
ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_original}.tex', work_folder_original)
ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_modified}.tex', work_folder_modified)
if mode!='translate_zh':
yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 使用latexdiff生成论文转化前后对比 ...', chatbot, history) # 刷新Gradio前端界面
print( f'latexdiff --encoding=utf8 --append-safecmd=subfile {work_folder_original}/{main_file_original}.tex {work_folder_modified}/{main_file_modified}.tex --flatten > {work_folder}/merge_diff.tex')
ok = compile_latex_with_timeout(f'latexdiff --encoding=utf8 --append-safecmd=subfile {work_folder_original}/{main_file_original}.tex {work_folder_modified}/{main_file_modified}.tex --flatten > {work_folder}/merge_diff.tex', os.getcwd())
yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 正在编译对比PDF ...', chatbot, history) # 刷新Gradio前端界面
ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error merge_diff.tex', work_folder)
ok = compile_latex_with_timeout(f'bibtex merge_diff.aux', work_folder)
ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error merge_diff.tex', work_folder)
ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error merge_diff.tex', work_folder)
# <---------- 检查结果 ----------->
results_ = ""
original_pdf_success = os.path.exists(pj(work_folder_original, f'{main_file_original}.pdf'))
modified_pdf_success = os.path.exists(pj(work_folder_modified, f'{main_file_modified}.pdf'))
diff_pdf_success = os.path.exists(pj(work_folder, f'merge_diff.pdf'))
results_ += f"原始PDF编译是否成功: {original_pdf_success};"
results_ += f"转化PDF编译是否成功: {modified_pdf_success};"
results_ += f"对比PDF编译是否成功: {diff_pdf_success};"
yield from update_ui_lastest_msg(f'{n_fix}编译结束:<br/>{results_}...', chatbot, history) # 刷新Gradio前端界面
if diff_pdf_success:
result_pdf = pj(work_folder_modified, f'merge_diff.pdf') # get pdf path
promote_file_to_downloadzone(result_pdf, rename_file=None, chatbot=chatbot) # promote file to web UI
if modified_pdf_success:
yield from update_ui_lastest_msg(f'转化PDF编译已经成功, 即将退出 ...', chatbot, history) # 刷新Gradio前端界面
result_pdf = pj(work_folder_modified, f'{main_file_modified}.pdf') # get pdf path
origin_pdf = pj(work_folder_original, f'{main_file_original}.pdf') # get pdf path
if os.path.exists(pj(work_folder, '..', 'translation')):
shutil.copyfile(result_pdf, pj(work_folder, '..', 'translation', 'translate_zh.pdf'))
promote_file_to_downloadzone(result_pdf, rename_file=None, chatbot=chatbot) # promote file to web UI
# 将两个PDF拼接
if original_pdf_success:
try:
from .latex_toolbox import merge_pdfs
concat_pdf = pj(work_folder_modified, f'comparison.pdf')
merge_pdfs(origin_pdf, result_pdf, concat_pdf)
promote_file_to_downloadzone(concat_pdf, rename_file=None, chatbot=chatbot) # promote file to web UI
except Exception as e:
pass
return True # 成功啦
else:
if n_fix>=max_try: break
n_fix += 1
can_retry, main_file_modified, buggy_lines = remove_buggy_lines(
file_path=pj(work_folder_modified, f'{main_file_modified}.tex'),
log_path=pj(work_folder_modified, f'{main_file_modified}.log'),
tex_name=f'{main_file_modified}.tex',
tex_name_pure=f'{main_file_modified}',
n_fix=n_fix,
work_folder_modified=work_folder_modified,
fixed_line=fixed_line
)
yield from update_ui_lastest_msg(f'由于最为关键的转化PDF编译失败, 将根据报错信息修正tex源文件并重试, 当前报错的latex代码处于第{buggy_lines}行 ...', chatbot, history) # 刷新Gradio前端界面
if not can_retry: break
return False # 失败啦
def write_html(sp_file_contents, sp_file_result, chatbot, project_folder):
# write html
try:
import shutil
from ..crazy_utils import construct_html
from toolbox import gen_time_str
ch = construct_html()
orig = ""
trans = ""
final = []
for c,r in zip(sp_file_contents, sp_file_result):
final.append(c)
final.append(r)
for i, k in enumerate(final):
if i%2==0:
orig = k
if i%2==1:
trans = k
ch.add_row(a=orig, b=trans)
create_report_file_name = f"{gen_time_str()}.trans.html"
res = ch.save_file(create_report_file_name)
shutil.copyfile(res, pj(project_folder, create_report_file_name))
promote_file_to_downloadzone(file=res, chatbot=chatbot)
except:
from toolbox import trimmed_format_exc
print('writing html result failed:', trimmed_format_exc())

查看文件

@@ -0,0 +1,464 @@
import os, shutil
import re
import numpy as np
PRESERVE = 0
TRANSFORM = 1
pj = os.path.join
class LinkedListNode():
"""
Linked List Node
"""
def __init__(self, string, preserve=True) -> None:
self.string = string
self.preserve = preserve
self.next = None
self.range = None
# self.begin_line = 0
# self.begin_char = 0
def convert_to_linklist(text, mask):
root = LinkedListNode("", preserve=True)
current_node = root
for c, m, i in zip(text, mask, range(len(text))):
if (m==PRESERVE and current_node.preserve) \
or (m==TRANSFORM and not current_node.preserve):
# add
current_node.string += c
else:
current_node.next = LinkedListNode(c, preserve=(m==PRESERVE))
current_node = current_node.next
return root
def post_process(root):
# 修复括号
node = root
while True:
string = node.string
if node.preserve:
node = node.next
if node is None: break
continue
def break_check(string):
str_stack = [""] # (lv, index)
for i, c in enumerate(string):
if c == '{':
str_stack.append('{')
elif c == '}':
if len(str_stack) == 1:
print('stack fix')
return i
str_stack.pop(-1)
else:
str_stack[-1] += c
return -1
bp = break_check(string)
if bp == -1:
pass
elif bp == 0:
node.string = string[:1]
q = LinkedListNode(string[1:], False)
q.next = node.next
node.next = q
else:
node.string = string[:bp]
q = LinkedListNode(string[bp:], False)
q.next = node.next
node.next = q
node = node.next
if node is None: break
# 屏蔽空行和太短的句子
node = root
while True:
if len(node.string.strip('\n').strip(''))==0: node.preserve = True
if len(node.string.strip('\n').strip(''))<42: node.preserve = True
node = node.next
if node is None: break
node = root
while True:
if node.next and node.preserve and node.next.preserve:
node.string += node.next.string
node.next = node.next.next
node = node.next
if node is None: break
# 将前后断行符脱离
node = root
prev_node = None
while True:
if not node.preserve:
lstriped_ = node.string.lstrip().lstrip('\n')
if (prev_node is not None) and (prev_node.preserve) and (len(lstriped_)!=len(node.string)):
prev_node.string += node.string[:-len(lstriped_)]
node.string = lstriped_
rstriped_ = node.string.rstrip().rstrip('\n')
if (node.next is not None) and (node.next.preserve) and (len(rstriped_)!=len(node.string)):
node.next.string = node.string[len(rstriped_):] + node.next.string
node.string = rstriped_
# =====
prev_node = node
node = node.next
if node is None: break
# 标注节点的行数范围
node = root
n_line = 0
expansion = 2
while True:
n_l = node.string.count('\n')
node.range = [n_line-expansion, n_line+n_l+expansion] # 失败时,扭转的范围
n_line = n_line+n_l
node = node.next
if node is None: break
return root
"""
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Latex segmentation with a binary mask (PRESERVE=0, TRANSFORM=1)
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
"""
def set_forbidden_text(text, mask, pattern, flags=0):
"""
Add a preserve text area in this paper
e.g. with pattern = r"\\begin\{algorithm\}(.*?)\\end\{algorithm\}"
you can mask out (mask = PRESERVE so that text become untouchable for GPT)
everything between "\begin{equation}" and "\end{equation}"
"""
if isinstance(pattern, list): pattern = '|'.join(pattern)
pattern_compile = re.compile(pattern, flags)
for res in pattern_compile.finditer(text):
mask[res.span()[0]:res.span()[1]] = PRESERVE
return text, mask
def reverse_forbidden_text(text, mask, pattern, flags=0, forbid_wrapper=True):
"""
Move area out of preserve area (make text editable for GPT)
count the number of the braces so as to catch compelete text area.
e.g.
\begin{abstract} blablablablablabla. \end{abstract}
"""
if isinstance(pattern, list): pattern = '|'.join(pattern)
pattern_compile = re.compile(pattern, flags)
for res in pattern_compile.finditer(text):
if not forbid_wrapper:
mask[res.span()[0]:res.span()[1]] = TRANSFORM
else:
mask[res.regs[0][0]: res.regs[1][0]] = PRESERVE # '\\begin{abstract}'
mask[res.regs[1][0]: res.regs[1][1]] = TRANSFORM # abstract
mask[res.regs[1][1]: res.regs[0][1]] = PRESERVE # abstract
return text, mask
def set_forbidden_text_careful_brace(text, mask, pattern, flags=0):
"""
Add a preserve text area in this paper (text become untouchable for GPT).
count the number of the braces so as to catch compelete text area.
e.g.
\caption{blablablablabla\texbf{blablabla}blablabla.}
"""
pattern_compile = re.compile(pattern, flags)
for res in pattern_compile.finditer(text):
brace_level = -1
p = begin = end = res.regs[0][0]
for _ in range(1024*16):
if text[p] == '}' and brace_level == 0: break
elif text[p] == '}': brace_level -= 1
elif text[p] == '{': brace_level += 1
p += 1
end = p+1
mask[begin:end] = PRESERVE
return text, mask
def reverse_forbidden_text_careful_brace(text, mask, pattern, flags=0, forbid_wrapper=True):
"""
Move area out of preserve area (make text editable for GPT)
count the number of the braces so as to catch compelete text area.
e.g.
\caption{blablablablabla\texbf{blablabla}blablabla.}
"""
pattern_compile = re.compile(pattern, flags)
for res in pattern_compile.finditer(text):
brace_level = 0
p = begin = end = res.regs[1][0]
for _ in range(1024*16):
if text[p] == '}' and brace_level == 0: break
elif text[p] == '}': brace_level -= 1
elif text[p] == '{': brace_level += 1
p += 1
end = p
mask[begin:end] = TRANSFORM
if forbid_wrapper:
mask[res.regs[0][0]:begin] = PRESERVE
mask[end:res.regs[0][1]] = PRESERVE
return text, mask
def set_forbidden_text_begin_end(text, mask, pattern, flags=0, limit_n_lines=42):
"""
Find all \begin{} ... \end{} text block that with less than limit_n_lines lines.
Add it to preserve area
"""
pattern_compile = re.compile(pattern, flags)
def search_with_line_limit(text, mask):
for res in pattern_compile.finditer(text):
cmd = res.group(1) # begin{what}
this = res.group(2) # content between begin and end
this_mask = mask[res.regs[2][0]:res.regs[2][1]]
white_list = ['document', 'abstract', 'lemma', 'definition', 'sproof',
'em', 'emph', 'textit', 'textbf', 'itemize', 'enumerate']
if (cmd in white_list) or this.count('\n') >= limit_n_lines: # use a magical number 42
this, this_mask = search_with_line_limit(this, this_mask)
mask[res.regs[2][0]:res.regs[2][1]] = this_mask
else:
mask[res.regs[0][0]:res.regs[0][1]] = PRESERVE
return text, mask
return search_with_line_limit(text, mask)
"""
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Latex Merge File
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
"""
def find_main_tex_file(file_manifest, mode):
"""
在多Tex文档中,寻找主文件,必须包含documentclass,返回找到的第一个。
P.S. 但愿没人把latex模板放在里面传进来 (6.25 加入判定latex模板的代码)
"""
canidates = []
for texf in file_manifest:
if os.path.basename(texf).startswith('merge'):
continue
with open(texf, 'r', encoding='utf8', errors='ignore') as f:
file_content = f.read()
if r'\documentclass' in file_content:
canidates.append(texf)
else:
continue
if len(canidates) == 0:
raise RuntimeError('无法找到一个主Tex文件包含documentclass关键字')
elif len(canidates) == 1:
return canidates[0]
else: # if len(canidates) >= 2 通过一些Latex模板中常见但通常不会出现在正文的单词,对不同latex源文件扣分,取评分最高者返回
canidates_score = []
# 给出一些判定模板文档的词作为扣分项
unexpected_words = ['\LaTeX', 'manuscript', 'Guidelines', 'font', 'citations', 'rejected', 'blind review', 'reviewers']
expected_words = ['\input', '\ref', '\cite']
for texf in canidates:
canidates_score.append(0)
with open(texf, 'r', encoding='utf8', errors='ignore') as f:
file_content = f.read()
file_content = rm_comments(file_content)
for uw in unexpected_words:
if uw in file_content:
canidates_score[-1] -= 1
for uw in expected_words:
if uw in file_content:
canidates_score[-1] += 1
select = np.argmax(canidates_score) # 取评分最高者返回
return canidates[select]
def rm_comments(main_file):
new_file_remove_comment_lines = []
for l in main_file.splitlines():
# 删除整行的空注释
if l.lstrip().startswith("%"):
pass
else:
new_file_remove_comment_lines.append(l)
main_file = '\n'.join(new_file_remove_comment_lines)
# main_file = re.sub(r"\\include{(.*?)}", r"\\input{\1}", main_file) # 将 \include 命令转换为 \input 命令
main_file = re.sub(r'(?<!\\)%.*', '', main_file) # 使用正则表达式查找半行注释, 并替换为空字符串
return main_file
def find_tex_file_ignore_case(fp):
dir_name = os.path.dirname(fp)
base_name = os.path.basename(fp)
# 如果输入的文件路径是正确的
if os.path.exists(pj(dir_name, base_name)): return pj(dir_name, base_name)
# 如果不正确,试着加上.tex后缀试试
if not base_name.endswith('.tex'): base_name+='.tex'
if os.path.exists(pj(dir_name, base_name)): return pj(dir_name, base_name)
# 如果还找不到,解除大小写限制,再试一次
import glob
for f in glob.glob(dir_name+'/*.tex'):
base_name_s = os.path.basename(fp)
base_name_f = os.path.basename(f)
if base_name_s.lower() == base_name_f.lower(): return f
# 试着加上.tex后缀试试
if not base_name_s.endswith('.tex'): base_name_s+='.tex'
if base_name_s.lower() == base_name_f.lower(): return f
return None
def merge_tex_files_(project_foler, main_file, mode):
"""
Merge Tex project recrusively
"""
main_file = rm_comments(main_file)
for s in reversed([q for q in re.finditer(r"\\input\{(.*?)\}", main_file, re.M)]):
f = s.group(1)
fp = os.path.join(project_foler, f)
fp_ = find_tex_file_ignore_case(fp)
if fp_:
with open(fp_, 'r', encoding='utf-8', errors='replace') as fx: c = fx.read()
else:
raise RuntimeError(f'找不到{fp},Tex源文件缺失')
c = merge_tex_files_(project_foler, c, mode)
main_file = main_file[:s.span()[0]] + c + main_file[s.span()[1]:]
return main_file
def merge_tex_files(project_foler, main_file, mode):
"""
Merge Tex project recrusively
P.S. 顺便把CTEX塞进去以支持中文
P.S. 顺便把Latex的注释去除
"""
main_file = merge_tex_files_(project_foler, main_file, mode)
main_file = rm_comments(main_file)
if mode == 'translate_zh':
# find paper documentclass
pattern = re.compile(r'\\documentclass.*\n')
match = pattern.search(main_file)
assert match is not None, "Cannot find documentclass statement!"
position = match.end()
add_ctex = '\\usepackage{ctex}\n'
add_url = '\\usepackage{url}\n' if '{url}' not in main_file else ''
main_file = main_file[:position] + add_ctex + add_url + main_file[position:]
# fontset=windows
import platform
main_file = re.sub(r"\\documentclass\[(.*?)\]{(.*?)}", r"\\documentclass[\1,fontset=windows,UTF8]{\2}",main_file)
main_file = re.sub(r"\\documentclass{(.*?)}", r"\\documentclass[fontset=windows,UTF8]{\1}",main_file)
# find paper abstract
pattern_opt1 = re.compile(r'\\begin\{abstract\}.*\n')
pattern_opt2 = re.compile(r"\\abstract\{(.*?)\}", flags=re.DOTALL)
match_opt1 = pattern_opt1.search(main_file)
match_opt2 = pattern_opt2.search(main_file)
assert (match_opt1 is not None) or (match_opt2 is not None), "Cannot find paper abstract section!"
return main_file
"""
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Post process
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
"""
def mod_inbraket(match):
"""
为啥chatgpt会把cite里面的逗号换成中文逗号呀
"""
# get the matched string
cmd = match.group(1)
str_to_modify = match.group(2)
# modify the matched string
str_to_modify = str_to_modify.replace('', ':') # 前面是中文冒号,后面是英文冒号
str_to_modify = str_to_modify.replace('', ',') # 前面是中文逗号,后面是英文逗号
# str_to_modify = 'BOOM'
return "\\" + cmd + "{" + str_to_modify + "}"
def fix_content(final_tex, node_string):
"""
Fix common GPT errors to increase success rate
"""
final_tex = re.sub(r"(?<!\\)%", "\\%", final_tex)
final_tex = re.sub(r"\\([a-z]{2,10})\ \{", r"\\\1{", string=final_tex)
final_tex = re.sub(r"\\\ ([a-z]{2,10})\{", r"\\\1{", string=final_tex)
final_tex = re.sub(r"\\([a-z]{2,10})\{([^\}]*?)\}", mod_inbraket, string=final_tex)
if "Traceback" in final_tex and "[Local Message]" in final_tex:
final_tex = node_string # 出问题了,还原原文
if node_string.count('\\begin') != final_tex.count('\\begin'):
final_tex = node_string # 出问题了,还原原文
if node_string.count('\_') > 0 and node_string.count('\_') > final_tex.count('\_'):
# walk and replace any _ without \
final_tex = re.sub(r"(?<!\\)_", "\\_", final_tex)
def compute_brace_level(string):
# this function count the number of { and }
brace_level = 0
for c in string:
if c == "{": brace_level += 1
elif c == "}": brace_level -= 1
return brace_level
def join_most(tex_t, tex_o):
# this function join translated string and original string when something goes wrong
p_t = 0
p_o = 0
def find_next(string, chars, begin):
p = begin
while p < len(string):
if string[p] in chars: return p, string[p]
p += 1
return None, None
while True:
res1, char = find_next(tex_o, ['{','}'], p_o)
if res1 is None: break
res2, char = find_next(tex_t, [char], p_t)
if res2 is None: break
p_o = res1 + 1
p_t = res2 + 1
return tex_t[:p_t] + tex_o[p_o:]
if compute_brace_level(final_tex) != compute_brace_level(node_string):
# 出问题了,还原部分原文,保证括号正确
final_tex = join_most(final_tex, node_string)
return final_tex
def compile_latex_with_timeout(command, cwd, timeout=60):
import subprocess
process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, cwd=cwd)
try:
stdout, stderr = process.communicate(timeout=timeout)
except subprocess.TimeoutExpired:
process.kill()
stdout, stderr = process.communicate()
print("Process timed out!")
return False
return True
def merge_pdfs(pdf1_path, pdf2_path, output_path):
import PyPDF2
Percent = 0.95
# Open the first PDF file
with open(pdf1_path, 'rb') as pdf1_file:
pdf1_reader = PyPDF2.PdfFileReader(pdf1_file)
# Open the second PDF file
with open(pdf2_path, 'rb') as pdf2_file:
pdf2_reader = PyPDF2.PdfFileReader(pdf2_file)
# Create a new PDF file to store the merged pages
output_writer = PyPDF2.PdfFileWriter()
# Determine the number of pages in each PDF file
num_pages = max(pdf1_reader.numPages, pdf2_reader.numPages)
# Merge the pages from the two PDF files
for page_num in range(num_pages):
# Add the page from the first PDF file
if page_num < pdf1_reader.numPages:
page1 = pdf1_reader.getPage(page_num)
else:
page1 = PyPDF2.PageObject.createBlankPage(pdf1_reader)
# Add the page from the second PDF file
if page_num < pdf2_reader.numPages:
page2 = pdf2_reader.getPage(page_num)
else:
page2 = PyPDF2.PageObject.createBlankPage(pdf1_reader)
# Create a new empty page with double width
new_page = PyPDF2.PageObject.createBlankPage(
width = int(int(page1.mediaBox.getWidth()) + int(page2.mediaBox.getWidth()) * Percent),
height = max(page1.mediaBox.getHeight(), page2.mediaBox.getHeight())
)
new_page.mergeTranslatedPage(page1, 0, 0)
new_page.mergeTranslatedPage(page2, int(int(page1.mediaBox.getWidth())-int(page2.mediaBox.getWidth())* (1-Percent)), 0)
output_writer.addPage(new_page)
# Save the merged PDF file
with open(output_path, 'wb') as output_file:
output_writer.write(output_file)

查看文件

@@ -0,0 +1,129 @@
import time, logging, json
class AliyunASR():
def test_on_sentence_begin(self, message, *args):
# print("test_on_sentence_begin:{}".format(message))
pass
def test_on_sentence_end(self, message, *args):
# print("test_on_sentence_end:{}".format(message))
message = json.loads(message)
self.parsed_sentence = message['payload']['result']
self.event_on_entence_end.set()
# print(self.parsed_sentence)
def test_on_start(self, message, *args):
# print("test_on_start:{}".format(message))
pass
def test_on_error(self, message, *args):
logging.error("on_error args=>{}".format(args))
pass
def test_on_close(self, *args):
self.aliyun_service_ok = False
pass
def test_on_result_chg(self, message, *args):
# print("test_on_chg:{}".format(message))
message = json.loads(message)
self.parsed_text = message['payload']['result']
self.event_on_result_chg.set()
def test_on_completed(self, message, *args):
# print("on_completed:args=>{} message=>{}".format(args, message))
pass
def audio_convertion_thread(self, uuid):
# 在一个异步线程中采集音频
import nls # pip install git+https://github.com/aliyun/alibabacloud-nls-python-sdk.git
import tempfile
from scipy import io
from toolbox import get_conf
from .audio_io import change_sample_rate
from .audio_io import RealtimeAudioDistribution
NEW_SAMPLERATE = 16000
rad = RealtimeAudioDistribution()
rad.clean_up()
temp_folder = tempfile.gettempdir()
TOKEN, APPKEY = get_conf('ALIYUN_TOKEN', 'ALIYUN_APPKEY')
if len(TOKEN) == 0:
TOKEN = self.get_token()
self.aliyun_service_ok = True
URL="wss://nls-gateway.aliyuncs.com/ws/v1"
sr = nls.NlsSpeechTranscriber(
url=URL,
token=TOKEN,
appkey=APPKEY,
on_sentence_begin=self.test_on_sentence_begin,
on_sentence_end=self.test_on_sentence_end,
on_start=self.test_on_start,
on_result_changed=self.test_on_result_chg,
on_completed=self.test_on_completed,
on_error=self.test_on_error,
on_close=self.test_on_close,
callback_args=[uuid.hex]
)
r = sr.start(aformat="pcm",
enable_intermediate_result=True,
enable_punctuation_prediction=True,
enable_inverse_text_normalization=True)
while not self.stop:
# time.sleep(self.capture_interval)
audio = rad.read(uuid.hex)
if audio is not None:
# convert to pcm file
temp_file = f'{temp_folder}/{uuid.hex}.pcm' #
dsdata = change_sample_rate(audio, rad.rate, NEW_SAMPLERATE) # 48000 --> 16000
io.wavfile.write(temp_file, NEW_SAMPLERATE, dsdata)
# read pcm binary
with open(temp_file, "rb") as f: data = f.read()
# print('audio len:', len(audio), '\t ds len:', len(dsdata), '\t need n send:', len(data)//640)
slices = zip(*(iter(data),) * 640) # 640个字节为一组
for i in slices: sr.send_audio(bytes(i))
else:
time.sleep(0.1)
if not self.aliyun_service_ok:
self.stop = True
self.stop_msg = 'Aliyun音频服务异常,请检查ALIYUN_TOKEN和ALIYUN_APPKEY是否过期。'
r = sr.stop()
def get_token(self):
from toolbox import get_conf
import json
from aliyunsdkcore.request import CommonRequest
from aliyunsdkcore.client import AcsClient
AccessKey_ID, AccessKey_secret = get_conf('ALIYUN_ACCESSKEY', 'ALIYUN_SECRET')
# 创建AcsClient实例
client = AcsClient(
AccessKey_ID,
AccessKey_secret,
"cn-shanghai"
)
# 创建request,并设置参数。
request = CommonRequest()
request.set_method('POST')
request.set_domain('nls-meta.cn-shanghai.aliyuncs.com')
request.set_version('2019-02-28')
request.set_action_name('CreateToken')
try:
response = client.do_action_with_exception(request)
print(response)
jss = json.loads(response)
if 'Token' in jss and 'Id' in jss['Token']:
token = jss['Token']['Id']
expireTime = jss['Token']['ExpireTime']
print("token = " + token)
print("expireTime = " + str(expireTime))
except Exception as e:
print(e)
return token

查看文件

@@ -0,0 +1,51 @@
import numpy as np
from scipy import interpolate
def Singleton(cls):
_instance = {}
def _singleton(*args, **kargs):
if cls not in _instance:
_instance[cls] = cls(*args, **kargs)
return _instance[cls]
return _singleton
@Singleton
class RealtimeAudioDistribution():
def __init__(self) -> None:
self.data = {}
self.max_len = 1024*1024
self.rate = 48000 # 只读,每秒采样数量
def clean_up(self):
self.data = {}
def feed(self, uuid, audio):
self.rate, audio_ = audio
# print('feed', len(audio_), audio_[-25:])
if uuid not in self.data:
self.data[uuid] = audio_
else:
new_arr = np.concatenate((self.data[uuid], audio_))
if len(new_arr) > self.max_len: new_arr = new_arr[-self.max_len:]
self.data[uuid] = new_arr
def read(self, uuid):
if uuid in self.data:
res = self.data.pop(uuid)
print('\r read-', len(res), '-', max(res), end='', flush=True)
else:
res = None
return res
def change_sample_rate(audio, old_sr, new_sr):
duration = audio.shape[0] / old_sr
time_old = np.linspace(0, duration, audio.shape[0])
time_new = np.linspace(0, duration, int(audio.shape[0] * new_sr / old_sr))
interpolator = interpolate.interp1d(time_old, audio.T)
new_audio = interpolator(time_new).T
return new_audio.astype(np.int16)

查看文件

@@ -0,0 +1,171 @@
from functools import lru_cache
from toolbox import gen_time_str
from toolbox import promote_file_to_downloadzone
from toolbox import write_history_to_file, promote_file_to_downloadzone
from toolbox import get_conf
from toolbox import ProxyNetworkActivate
from colorful import *
import requests
import random
import copy
import os
import math
class GROBID_OFFLINE_EXCEPTION(Exception): pass
def get_avail_grobid_url():
GROBID_URLS, = get_conf('GROBID_URLS')
if len(GROBID_URLS) == 0: return None
try:
_grobid_url = random.choice(GROBID_URLS) # 随机负载均衡
if _grobid_url.endswith('/'): _grobid_url = _grobid_url.rstrip('/')
with ProxyNetworkActivate('Connect_Grobid'):
res = requests.get(_grobid_url+'/api/isalive')
if res.text=='true': return _grobid_url
else: return None
except:
return None
@lru_cache(maxsize=32)
def parse_pdf(pdf_path, grobid_url):
import scipdf # pip install scipdf_parser
if grobid_url.endswith('/'): grobid_url = grobid_url.rstrip('/')
try:
with ProxyNetworkActivate('Connect_Grobid'):
article_dict = scipdf.parse_pdf_to_dict(pdf_path, grobid_url=grobid_url)
except GROBID_OFFLINE_EXCEPTION:
raise GROBID_OFFLINE_EXCEPTION("GROBID服务不可用,请修改config中的GROBID_URL,可修改成本地GROBID服务。")
except:
raise RuntimeError("解析PDF失败,请检查PDF是否损坏。")
return article_dict
def produce_report_markdown(gpt_response_collection, meta, paper_meta_info, chatbot, fp, generated_conclusion_files):
# -=-=-=-=-=-=-=-= 写出第1个文件翻译前后混合 -=-=-=-=-=-=-=-=
res_path = write_history_to_file(meta + ["# Meta Translation" , paper_meta_info] + gpt_response_collection, file_basename=f"{gen_time_str()}translated_and_original.md", file_fullname=None)
promote_file_to_downloadzone(res_path, rename_file=os.path.basename(res_path)+'.md', chatbot=chatbot)
generated_conclusion_files.append(res_path)
# -=-=-=-=-=-=-=-= 写出第2个文件仅翻译后的文本 -=-=-=-=-=-=-=-=
translated_res_array = []
# 记录当前的大章节标题:
last_section_name = ""
for index, value in enumerate(gpt_response_collection):
# 先挑选偶数序列号:
if index % 2 != 0:
# 先提取当前英文标题:
cur_section_name = gpt_response_collection[index-1].split('\n')[0].split(" Part")[0]
# 如果index是1的话,则直接使用first section name
if cur_section_name != last_section_name:
cur_value = cur_section_name + '\n'
last_section_name = copy.deepcopy(cur_section_name)
else:
cur_value = ""
# 再做一个小修改重新修改当前part的标题,默认用英文的
cur_value += value
translated_res_array.append(cur_value)
res_path = write_history_to_file(meta + ["# Meta Translation" , paper_meta_info] + translated_res_array,
file_basename = f"{gen_time_str()}-translated_only.md",
file_fullname = None,
auto_caption = False)
promote_file_to_downloadzone(res_path, rename_file=os.path.basename(res_path)+'.md', chatbot=chatbot)
generated_conclusion_files.append(res_path)
return res_path
def translate_pdf(article_dict, llm_kwargs, chatbot, fp, generated_conclusion_files, TOKEN_LIMIT_PER_FRAGMENT, DST_LANG):
from crazy_functions.crazy_utils import construct_html
from crazy_functions.crazy_utils import breakdown_txt_to_satisfy_token_limit_for_pdf
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from crazy_functions.crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
prompt = "以下是一篇学术论文的基本信息:\n"
# title
title = article_dict.get('title', '无法获取 title'); prompt += f'title:{title}\n\n'
# authors
authors = article_dict.get('authors', '无法获取 authors'); prompt += f'authors:{authors}\n\n'
# abstract
abstract = article_dict.get('abstract', '无法获取 abstract'); prompt += f'abstract:{abstract}\n\n'
# command
prompt += f"请将题目和摘要翻译为{DST_LANG}"
meta = [f'# Title:\n\n', title, f'# Abstract:\n\n', abstract ]
# 单线,获取文章meta信息
paper_meta_info = yield from request_gpt_model_in_new_thread_with_ui_alive(
inputs=prompt,
inputs_show_user=prompt,
llm_kwargs=llm_kwargs,
chatbot=chatbot, history=[],
sys_prompt="You are an academic paper reader。",
)
# 多线,翻译
inputs_array = []
inputs_show_user_array = []
# get_token_num
from request_llm.bridge_all import model_info
enc = model_info[llm_kwargs['llm_model']]['tokenizer']
def get_token_num(txt): return len(enc.encode(txt, disallowed_special=()))
def break_down(txt):
raw_token_num = get_token_num(txt)
if raw_token_num <= TOKEN_LIMIT_PER_FRAGMENT:
return [txt]
else:
# raw_token_num > TOKEN_LIMIT_PER_FRAGMENT
# find a smooth token limit to achieve even seperation
count = int(math.ceil(raw_token_num / TOKEN_LIMIT_PER_FRAGMENT))
token_limit_smooth = raw_token_num // count + count
return breakdown_txt_to_satisfy_token_limit_for_pdf(txt, get_token_fn=get_token_num, limit=token_limit_smooth)
for section in article_dict.get('sections'):
if len(section['text']) == 0: continue
section_frags = break_down(section['text'])
for i, fragment in enumerate(section_frags):
heading = section['heading']
if len(section_frags) > 1: heading += f' Part-{i+1}'
inputs_array.append(
f"你需要翻译{heading}章节,内容如下: \n\n{fragment}"
)
inputs_show_user_array.append(
f"# {heading}\n\n{fragment}"
)
gpt_response_collection = yield from request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
inputs_array=inputs_array,
inputs_show_user_array=inputs_show_user_array,
llm_kwargs=llm_kwargs,
chatbot=chatbot,
history_array=[meta for _ in inputs_array],
sys_prompt_array=[
"请你作为一个学术翻译,负责把学术论文准确翻译成中文。注意文章中的每一句话都要翻译。" for _ in inputs_array],
)
# -=-=-=-=-=-=-=-= 写出Markdown文件 -=-=-=-=-=-=-=-=
produce_report_markdown(gpt_response_collection, meta, paper_meta_info, chatbot, fp, generated_conclusion_files)
# -=-=-=-=-=-=-=-= 写出HTML文件 -=-=-=-=-=-=-=-=
ch = construct_html()
orig = ""
trans = ""
gpt_response_collection_html = copy.deepcopy(gpt_response_collection)
for i,k in enumerate(gpt_response_collection_html):
if i%2==0:
gpt_response_collection_html[i] = inputs_show_user_array[i//2]
else:
# 先提取当前英文标题:
cur_section_name = gpt_response_collection[i-1].split('\n')[0].split(" Part")[0]
cur_value = cur_section_name + "\n" + gpt_response_collection_html[i]
gpt_response_collection_html[i] = cur_value
final = ["", "", "一、论文概况", "", "Abstract", paper_meta_info, "二、论文翻译", ""]
final.extend(gpt_response_collection_html)
for i, k in enumerate(final):
if i%2==0:
orig = k
if i%2==1:
trans = k
ch.add_row(a=orig, b=trans)
create_report_file_name = f"{os.path.basename(fp)}.trans.html"
html_file = ch.save_file(create_report_file_name)
generated_conclusion_files.append(html_file)
promote_file_to_downloadzone(html_file, rename_file=os.path.basename(html_file), chatbot=chatbot)

查看文件

@@ -1,87 +0,0 @@
#include "libipc/buffer.h"
#include "libipc/utility/pimpl.h"
#include <cstring>
namespace ipc {
bool operator==(buffer const & b1, buffer const & b2) {
return (b1.size() == b2.size()) && (std::memcmp(b1.data(), b2.data(), b1.size()) == 0);
}
bool operator!=(buffer const & b1, buffer const & b2) {
return !(b1 == b2);
}
class buffer::buffer_ : public pimpl<buffer_> {
public:
void* p_;
std::size_t s_;
void* a_;
buffer::destructor_t d_;
buffer_(void* p, std::size_t s, buffer::destructor_t d, void* a)
: p_(p), s_(s), a_(a), d_(d) {
}
~buffer_() {
if (d_ == nullptr) return;
d_((a_ == nullptr) ? p_ : a_, s_);
}
};
buffer::buffer()
: buffer(nullptr, 0, nullptr, nullptr) {
}
buffer::buffer(void* p, std::size_t s, destructor_t d)
: p_(p_->make(p, s, d, nullptr)) {
}
buffer::buffer(void* p, std::size_t s, destructor_t d, void* additional)
: p_(p_->make(p, s, d, additional)) {
}
buffer::buffer(void* p, std::size_t s)
: buffer(p, s, nullptr) {
}
buffer::buffer(char const & c)
: buffer(const_cast<char*>(&c), 1) {
}
buffer::buffer(buffer&& rhs)
: buffer() {
swap(rhs);
}
buffer::~buffer() {
p_->clear();
}
void buffer::swap(buffer& rhs) {
std::swap(p_, rhs.p_);
}
buffer& buffer::operator=(buffer rhs) {
swap(rhs);
return *this;
}
bool buffer::empty() const noexcept {
return (impl(p_)->p_ == nullptr) || (impl(p_)->s_ == 0);
}
void* buffer::data() noexcept {
return impl(p_)->p_;
}
void const * buffer::data() const noexcept {
return impl(p_)->p_;
}
std::size_t buffer::size() const noexcept {
return impl(p_)->s_;
}
} // namespace ipc

查看文件

@@ -1,701 +0,0 @@
#include <type_traits>
#include <cstring>
#include <algorithm>
#include <utility> // std::pair, std::move, std::forward
#include <atomic>
#include <type_traits> // aligned_storage_t
#include <string>
#include <vector>
#include <array>
#include <cassert>
#include "libipc/ipc.h"
#include "libipc/def.h"
#include "libipc/shm.h"
#include "libipc/pool_alloc.h"
#include "libipc/queue.h"
#include "libipc/policy.h"
#include "libipc/rw_lock.h"
#include "libipc/waiter.h"
#include "libipc/utility/log.h"
#include "libipc/utility/id_pool.h"
#include "libipc/utility/scope_guard.h"
#include "libipc/utility/utility.h"
#include "libipc/memory/resource.h"
#include "libipc/platform/detail.h"
#include "libipc/circ/elem_array.h"
namespace {
using msg_id_t = std::uint32_t;
using acc_t = std::atomic<msg_id_t>;
template <std::size_t DataSize, std::size_t AlignSize>
struct msg_t;
template <std::size_t AlignSize>
struct msg_t<0, AlignSize> {
msg_id_t cc_id_;
msg_id_t id_;
std::int32_t remain_;
bool storage_;
};
template <std::size_t DataSize, std::size_t AlignSize>
struct msg_t : msg_t<0, AlignSize> {
std::aligned_storage_t<DataSize, AlignSize> data_ {};
msg_t() = default;
msg_t(msg_id_t cc_id, msg_id_t id, std::int32_t remain, void const * data, std::size_t size)
: msg_t<0, AlignSize> {cc_id, id, remain, (data == nullptr) || (size == 0)} {
if (this->storage_) {
if (data != nullptr) {
// copy storage-id
*reinterpret_cast<ipc::storage_id_t*>(&data_) =
*static_cast<ipc::storage_id_t const *>(data);
}
}
else std::memcpy(&data_, data, size);
}
};
template <typename T>
ipc::buff_t make_cache(T& data, std::size_t size) {
auto ptr = ipc::mem::alloc(size);
std::memcpy(ptr, &data, (ipc::detail::min)(sizeof(data), size));
return { ptr, size, ipc::mem::free };
}
struct cache_t {
std::size_t fill_;
ipc::buff_t buff_;
cache_t(std::size_t f, ipc::buff_t && b)
: fill_(f), buff_(std::move(b))
{}
void append(void const * data, std::size_t size) {
if (fill_ >= buff_.size() || data == nullptr || size == 0) return;
auto new_fill = (ipc::detail::min)(fill_ + size, buff_.size());
std::memcpy(static_cast<ipc::byte_t*>(buff_.data()) + fill_, data, new_fill - fill_);
fill_ = new_fill;
}
};
auto cc_acc() {
static ipc::shm::handle acc_h("__CA_CONN__", sizeof(acc_t));
return static_cast<acc_t*>(acc_h.get());
}
IPC_CONSTEXPR_ std::size_t align_chunk_size(std::size_t size) noexcept {
return (((size - 1) / ipc::large_msg_align) + 1) * ipc::large_msg_align;
}
IPC_CONSTEXPR_ std::size_t calc_chunk_size(std::size_t size) noexcept {
return ipc::make_align(alignof(std::max_align_t), align_chunk_size(
ipc::make_align(alignof(std::max_align_t), sizeof(std::atomic<ipc::circ::cc_t>)) + size));
}
struct chunk_t {
std::atomic<ipc::circ::cc_t> &conns() noexcept {
return *reinterpret_cast<std::atomic<ipc::circ::cc_t> *>(this);
}
void *data() noexcept {
return reinterpret_cast<ipc::byte_t *>(this)
+ ipc::make_align(alignof(std::max_align_t), sizeof(std::atomic<ipc::circ::cc_t>));
}
};
struct chunk_info_t {
ipc::id_pool<> pool_;
ipc::spin_lock lock_;
IPC_CONSTEXPR_ static std::size_t chunks_mem_size(std::size_t chunk_size) noexcept {
return ipc::id_pool<>::max_count * chunk_size;
}
ipc::byte_t *chunks_mem() noexcept {
return reinterpret_cast<ipc::byte_t *>(this + 1);
}
chunk_t *at(std::size_t chunk_size, ipc::storage_id_t id) noexcept {
if (id < 0) return nullptr;
return reinterpret_cast<chunk_t *>(chunks_mem() + (chunk_size * id));
}
};
auto& chunk_storages() {
class chunk_handle_t {
ipc::shm::handle handle_;
public:
chunk_info_t *get_info(std::size_t chunk_size) {
if (!handle_.valid() &&
!handle_.acquire( ("__CHUNK_INFO__" + ipc::to_string(chunk_size)).c_str(),
sizeof(chunk_info_t) + chunk_info_t::chunks_mem_size(chunk_size) )) {
ipc::error("[chunk_storages] chunk_shm.id_info_.acquire failed: chunk_size = %zd\n", chunk_size);
return nullptr;
}
auto info = static_cast<chunk_info_t*>(handle_.get());
if (info == nullptr) {
ipc::error("[chunk_storages] chunk_shm.id_info_.get failed: chunk_size = %zd\n", chunk_size);
return nullptr;
}
return info;
}
};
static ipc::map<std::size_t, chunk_handle_t> chunk_hs;
return chunk_hs;
}
chunk_info_t *chunk_storage_info(std::size_t chunk_size) {
auto &storages = chunk_storages();
std::decay_t<decltype(storages)>::iterator it;
{
static ipc::rw_lock lock;
IPC_UNUSED_ std::shared_lock<ipc::rw_lock> guard {lock};
if ((it = storages.find(chunk_size)) == storages.end()) {
using chunk_handle_t = std::decay_t<decltype(storages)>::value_type::second_type;
guard.unlock();
IPC_UNUSED_ std::lock_guard<ipc::rw_lock> guard {lock};
it = storages.emplace(chunk_size, chunk_handle_t{}).first;
}
}
return it->second.get_info(chunk_size);
}
std::pair<ipc::storage_id_t, void*> acquire_storage(std::size_t size, ipc::circ::cc_t conns) {
std::size_t chunk_size = calc_chunk_size(size);
auto info = chunk_storage_info(chunk_size);
if (info == nullptr) return {};
info->lock_.lock();
info->pool_.prepare();
// got an unique id
auto id = info->pool_.acquire();
info->lock_.unlock();
auto chunk = info->at(chunk_size, id);
if (chunk == nullptr) return {};
chunk->conns().store(conns, std::memory_order_relaxed);
return { id, chunk->data() };
}
void *find_storage(ipc::storage_id_t id, std::size_t size) {
if (id < 0) {
ipc::error("[find_storage] id is invalid: id = %ld, size = %zd\n", (long)id, size);
return nullptr;
}
std::size_t chunk_size = calc_chunk_size(size);
auto info = chunk_storage_info(chunk_size);
if (info == nullptr) return nullptr;
return info->at(chunk_size, id)->data();
}
void release_storage(ipc::storage_id_t id, std::size_t size) {
if (id < 0) {
ipc::error("[release_storage] id is invalid: id = %ld, size = %zd\n", (long)id, size);
return;
}
std::size_t chunk_size = calc_chunk_size(size);
auto info = chunk_storage_info(chunk_size);
if (info == nullptr) return;
info->lock_.lock();
info->pool_.release(id);
info->lock_.unlock();
}
template <ipc::relat Rp, ipc::relat Rc>
bool sub_rc(ipc::wr<Rp, Rc, ipc::trans::unicast>,
std::atomic<ipc::circ::cc_t> &/*conns*/, ipc::circ::cc_t /*curr_conns*/, ipc::circ::cc_t /*conn_id*/) noexcept {
return true;
}
template <ipc::relat Rp, ipc::relat Rc>
bool sub_rc(ipc::wr<Rp, Rc, ipc::trans::broadcast>,
std::atomic<ipc::circ::cc_t> &conns, ipc::circ::cc_t curr_conns, ipc::circ::cc_t conn_id) noexcept {
auto last_conns = curr_conns & ~conn_id;
for (unsigned k = 0;;) {
auto chunk_conns = conns.load(std::memory_order_acquire);
if (conns.compare_exchange_weak(chunk_conns, chunk_conns & last_conns, std::memory_order_release)) {
return (chunk_conns & last_conns) == 0;
}
ipc::yield(k);
}
}
template <typename Flag>
void recycle_storage(ipc::storage_id_t id, std::size_t size, ipc::circ::cc_t curr_conns, ipc::circ::cc_t conn_id) {
if (id < 0) {
ipc::error("[recycle_storage] id is invalid: id = %ld, size = %zd\n", (long)id, size);
return;
}
std::size_t chunk_size = calc_chunk_size(size);
auto info = chunk_storage_info(chunk_size);
if (info == nullptr) return;
auto chunk = info->at(chunk_size, id);
if (chunk == nullptr) return;
if (!sub_rc(Flag{}, chunk->conns(), curr_conns, conn_id)) {
return;
}
info->lock_.lock();
info->pool_.release(id);
info->lock_.unlock();
}
template <typename MsgT>
bool clear_message(void* p) {
auto msg = static_cast<MsgT*>(p);
if (msg->storage_) {
std::int32_t r_size = static_cast<std::int32_t>(ipc::data_length) + msg->remain_;
if (r_size <= 0) {
ipc::error("[clear_message] invalid msg size: %d\n", (int)r_size);
return true;
}
release_storage(
*reinterpret_cast<ipc::storage_id_t*>(&msg->data_),
static_cast<std::size_t>(r_size));
}
return true;
}
struct conn_info_head {
ipc::string name_;
msg_id_t cc_id_; // connection-info id
ipc::detail::waiter cc_waiter_, wt_waiter_, rd_waiter_;
ipc::shm::handle acc_h_;
conn_info_head(char const * name)
: name_ {name}
, cc_id_ {(cc_acc() == nullptr) ? 0 : cc_acc()->fetch_add(1, std::memory_order_relaxed)}
, cc_waiter_{("__CC_CONN__" + name_).c_str()}
, wt_waiter_{("__WT_CONN__" + name_).c_str()}
, rd_waiter_{("__RD_CONN__" + name_).c_str()}
, acc_h_ {("__AC_CONN__" + name_).c_str(), sizeof(acc_t)} {
}
void quit_waiting() {
cc_waiter_.quit_waiting();
wt_waiter_.quit_waiting();
rd_waiter_.quit_waiting();
}
auto acc() {
return static_cast<acc_t*>(acc_h_.get());
}
auto& recv_cache() {
thread_local ipc::unordered_map<msg_id_t, cache_t> tls;
return tls;
}
};
template <typename W, typename F>
bool wait_for(W& waiter, F&& pred, std::uint64_t tm) {
if (tm == 0) return !pred();
for (unsigned k = 0; pred();) {
bool ret = true;
ipc::sleep(k, [&k, &ret, &waiter, &pred, tm] {
ret = waiter.wait_if(std::forward<F>(pred), tm);
k = 0;
});
if (!ret) return false; // timeout or fail
if (k == 0) break; // k has been reset
}
return true;
}
template <typename Policy,
std::size_t DataSize = ipc::data_length,
std::size_t AlignSize = (ipc::detail::min)(DataSize, alignof(std::max_align_t))>
struct queue_generator {
using queue_t = ipc::queue<msg_t<DataSize, AlignSize>, Policy>;
struct conn_info_t : conn_info_head {
queue_t que_;
conn_info_t(char const * name)
: conn_info_head{name}
, que_{("__QU_CONN__" +
ipc::to_string(DataSize) + "__" +
ipc::to_string(AlignSize) + "__" + name).c_str()} {
}
void disconnect_receiver() {
bool dis = que_.disconnect();
this->quit_waiting();
if (dis) {
this->recv_cache().clear();
}
}
};
};
template <typename Policy>
struct detail_impl {
using policy_t = Policy;
using flag_t = typename policy_t::flag_t;
using queue_t = typename queue_generator<policy_t>::queue_t;
using conn_info_t = typename queue_generator<policy_t>::conn_info_t;
constexpr static conn_info_t* info_of(ipc::handle_t h) noexcept {
return static_cast<conn_info_t*>(h);
}
constexpr static queue_t* queue_of(ipc::handle_t h) noexcept {
return (info_of(h) == nullptr) ? nullptr : &(info_of(h)->que_);
}
/* API implementations */
static void disconnect(ipc::handle_t h) {
auto que = queue_of(h);
if (que == nullptr) {
return;
}
que->shut_sending();
assert(info_of(h) != nullptr);
info_of(h)->disconnect_receiver();
}
static bool reconnect(ipc::handle_t * ph, bool start_to_recv) {
assert(ph != nullptr);
assert(*ph != nullptr);
auto que = queue_of(*ph);
if (que == nullptr) {
return false;
}
if (start_to_recv) {
que->shut_sending();
if (que->connect()) { // wouldn't connect twice
info_of(*ph)->cc_waiter_.broadcast();
return true;
}
return false;
}
// start_to_recv == false
if (que->connected()) {
info_of(*ph)->disconnect_receiver();
}
return que->ready_sending();
}
static bool connect(ipc::handle_t * ph, char const * name, bool start_to_recv) {
assert(ph != nullptr);
if (*ph == nullptr) {
*ph = ipc::mem::alloc<conn_info_t>(name);
}
return reconnect(ph, start_to_recv);
}
static void destroy(ipc::handle_t h) {
disconnect(h);
ipc::mem::free(info_of(h));
}
static std::size_t recv_count(ipc::handle_t h) noexcept {
auto que = queue_of(h);
if (que == nullptr) {
return ipc::invalid_value;
}
return que->conn_count();
}
static bool wait_for_recv(ipc::handle_t h, std::size_t r_count, std::uint64_t tm) {
auto que = queue_of(h);
if (que == nullptr) {
return false;
}
return wait_for(info_of(h)->cc_waiter_, [que, r_count] {
return que->conn_count() < r_count;
}, tm);
}
template <typename F>
static bool send(F&& gen_push, ipc::handle_t h, void const * data, std::size_t size) {
if (data == nullptr || size == 0) {
ipc::error("fail: send(%p, %zd)\n", data, size);
return false;
}
auto que = queue_of(h);
if (que == nullptr) {
ipc::error("fail: send, queue_of(h) == nullptr\n");
return false;
}
if (que->elems() == nullptr) {
ipc::error("fail: send, queue_of(h)->elems() == nullptr\n");
return false;
}
if (!que->ready_sending()) {
ipc::error("fail: send, que->ready_sending() == false\n");
return false;
}
ipc::circ::cc_t conns = que->elems()->connections(std::memory_order_relaxed);
if (conns == 0) {
ipc::error("fail: send, there is no receiver on this connection.\n");
return false;
}
// calc a new message id
auto acc = info_of(h)->acc();
if (acc == nullptr) {
ipc::error("fail: send, info_of(h)->acc() == nullptr\n");
return false;
}
auto msg_id = acc->fetch_add(1, std::memory_order_relaxed);
auto try_push = std::forward<F>(gen_push)(info_of(h), que, msg_id);
if (size > ipc::large_msg_limit) {
auto dat = acquire_storage(size, conns);
void * buf = dat.second;
if (buf != nullptr) {
std::memcpy(buf, data, size);
return try_push(static_cast<std::int32_t>(size) -
static_cast<std::int32_t>(ipc::data_length), &(dat.first), 0);
}
// try using message fragment
//ipc::log("fail: shm::handle for big message. msg_id: %zd, size: %zd\n", msg_id, size);
}
// push message fragment
std::int32_t offset = 0;
for (std::int32_t i = 0; i < static_cast<std::int32_t>(size / ipc::data_length); ++i, offset += ipc::data_length) {
if (!try_push(static_cast<std::int32_t>(size) - offset - static_cast<std::int32_t>(ipc::data_length),
static_cast<ipc::byte_t const *>(data) + offset, ipc::data_length)) {
return false;
}
}
// if remain > 0, this is the last message fragment
std::int32_t remain = static_cast<std::int32_t>(size) - offset;
if (remain > 0) {
if (!try_push(remain - static_cast<std::int32_t>(ipc::data_length),
static_cast<ipc::byte_t const *>(data) + offset,
static_cast<std::size_t>(remain))) {
return false;
}
}
return true;
}
static bool send(ipc::handle_t h, void const * data, std::size_t size, std::uint64_t tm) {
return send([tm](auto info, auto que, auto msg_id) {
return [tm, info, que, msg_id](std::int32_t remain, void const * data, std::size_t size) {
if (!wait_for(info->wt_waiter_, [&] {
return !que->push(
[](void*) { return true; },
info->cc_id_, msg_id, remain, data, size);
}, tm)) {
ipc::log("force_push: msg_id = %zd, remain = %d, size = %zd\n", msg_id, remain, size);
if (!que->force_push(
clear_message<typename queue_t::value_t>,
info->cc_id_, msg_id, remain, data, size)) {
return false;
}
}
info->rd_waiter_.broadcast();
return true;
};
}, h, data, size);
}
static bool try_send(ipc::handle_t h, void const * data, std::size_t size, std::uint64_t tm) {
return send([tm](auto info, auto que, auto msg_id) {
return [tm, info, que, msg_id](std::int32_t remain, void const * data, std::size_t size) {
if (!wait_for(info->wt_waiter_, [&] {
return !que->push(
[](void*) { return true; },
info->cc_id_, msg_id, remain, data, size);
}, tm)) {
return false;
}
info->rd_waiter_.broadcast();
return true;
};
}, h, data, size);
}
static ipc::buff_t recv(ipc::handle_t h, std::uint64_t tm) {
auto que = queue_of(h);
if (que == nullptr) {
ipc::error("fail: recv, queue_of(h) == nullptr\n");
return {};
}
if (!que->connected()) {
// hasn't connected yet, just return.
return {};
}
auto& rc = info_of(h)->recv_cache();
for (;;) {
// pop a new message
typename queue_t::value_t msg;
if (!wait_for(info_of(h)->rd_waiter_, [que, &msg] {
return !que->pop(msg);
}, tm)) {
// pop failed, just return.
return {};
}
info_of(h)->wt_waiter_.broadcast();
if ((info_of(h)->acc() != nullptr) && (msg.cc_id_ == info_of(h)->cc_id_)) {
continue; // ignore message to self
}
// msg.remain_ may minus & abs(msg.remain_) < data_length
std::int32_t r_size = static_cast<std::int32_t>(ipc::data_length) + msg.remain_;
if (r_size <= 0) {
ipc::error("fail: recv, r_size = %d\n", (int)r_size);
return {};
}
std::size_t msg_size = static_cast<std::size_t>(r_size);
// large message
if (msg.storage_) {
ipc::storage_id_t buf_id = *reinterpret_cast<ipc::storage_id_t*>(&msg.data_);
void* buf = find_storage(buf_id, msg_size);
if (buf != nullptr) {
struct recycle_t {
ipc::storage_id_t storage_id;
ipc::circ::cc_t curr_conns;
ipc::circ::cc_t conn_id;
} *r_info = ipc::mem::alloc<recycle_t>(recycle_t{
buf_id, que->elems()->connections(std::memory_order_relaxed), que->connected_id()
});
if (r_info == nullptr) {
ipc::log("fail: ipc::mem::alloc<recycle_t>.\n");
return ipc::buff_t{buf, msg_size}; // no recycle
} else {
return ipc::buff_t{buf, msg_size, [](void* p_info, std::size_t size) {
auto r_info = static_cast<recycle_t *>(p_info);
IPC_UNUSED_ auto finally = ipc::guard([r_info] {
ipc::mem::free(r_info);
});
recycle_storage<flag_t>(r_info->storage_id, size, r_info->curr_conns, r_info->conn_id);
}, r_info};
}
} else {
ipc::log("fail: shm::handle for large message. msg_id: %zd, buf_id: %zd, size: %zd\n", msg.id_, buf_id, msg_size);
continue;
}
}
// find cache with msg.id_
auto cac_it = rc.find(msg.id_);
if (cac_it == rc.end()) {
if (msg_size <= ipc::data_length) {
return make_cache(msg.data_, msg_size);
}
// gc
if (rc.size() > 1024) {
std::vector<msg_id_t> need_del;
for (auto const & pair : rc) {
auto cmp = std::minmax(msg.id_, pair.first);
if (cmp.second - cmp.first > 8192) {
need_del.push_back(pair.first);
}
}
for (auto id : need_del) rc.erase(id);
}
// cache the first message fragment
rc.emplace(msg.id_, cache_t { ipc::data_length, make_cache(msg.data_, msg_size) });
}
// has cached before this message
else {
auto& cac = cac_it->second;
// this is the last message fragment
if (msg.remain_ <= 0) {
cac.append(&(msg.data_), msg_size);
// finish this message, erase it from cache
auto buff = std::move(cac.buff_);
rc.erase(cac_it);
return buff;
}
// there are remain datas after this message
cac.append(&(msg.data_), ipc::data_length);
}
}
}
static ipc::buff_t try_recv(ipc::handle_t h) {
return recv(h, 0);
}
}; // detail_impl<Policy>
template <typename Flag>
using policy_t = ipc::policy::choose<ipc::circ::elem_array, Flag>;
} // internal-linkage
namespace ipc {
template <typename Flag>
ipc::handle_t chan_impl<Flag>::inited() {
ipc::detail::waiter::init();
return nullptr;
}
template <typename Flag>
bool chan_impl<Flag>::connect(ipc::handle_t * ph, char const * name, unsigned mode) {
return detail_impl<policy_t<Flag>>::connect(ph, name, mode & receiver);
}
template <typename Flag>
bool chan_impl<Flag>::reconnect(ipc::handle_t * ph, unsigned mode) {
return detail_impl<policy_t<Flag>>::reconnect(ph, mode & receiver);
}
template <typename Flag>
void chan_impl<Flag>::disconnect(ipc::handle_t h) {
detail_impl<policy_t<Flag>>::disconnect(h);
}
template <typename Flag>
void chan_impl<Flag>::destroy(ipc::handle_t h) {
detail_impl<policy_t<Flag>>::destroy(h);
}
template <typename Flag>
char const * chan_impl<Flag>::name(ipc::handle_t h) {
auto info = detail_impl<policy_t<Flag>>::info_of(h);
return (info == nullptr) ? nullptr : info->name_.c_str();
}
template <typename Flag>
std::size_t chan_impl<Flag>::recv_count(ipc::handle_t h) {
return detail_impl<policy_t<Flag>>::recv_count(h);
}
template <typename Flag>
bool chan_impl<Flag>::wait_for_recv(ipc::handle_t h, std::size_t r_count, std::uint64_t tm) {
return detail_impl<policy_t<Flag>>::wait_for_recv(h, r_count, tm);
}
template <typename Flag>
bool chan_impl<Flag>::send(ipc::handle_t h, void const * data, std::size_t size, std::uint64_t tm) {
return detail_impl<policy_t<Flag>>::send(h, data, size, tm);
}
template <typename Flag>
buff_t chan_impl<Flag>::recv(ipc::handle_t h, std::uint64_t tm) {
return detail_impl<policy_t<Flag>>::recv(h, tm);
}
template <typename Flag>
bool chan_impl<Flag>::try_send(ipc::handle_t h, void const * data, std::size_t size, std::uint64_t tm) {
return detail_impl<policy_t<Flag>>::try_send(h, data, size, tm);
}
template <typename Flag>
buff_t chan_impl<Flag>::try_recv(ipc::handle_t h) {
return detail_impl<policy_t<Flag>>::try_recv(h);
}
template struct chan_impl<ipc::wr<relat::single, relat::single, trans::unicast >>;
// template struct chan_impl<ipc::wr<relat::single, relat::multi , trans::unicast >>; // TBD
// template struct chan_impl<ipc::wr<relat::multi , relat::multi , trans::unicast >>; // TBD
template struct chan_impl<ipc::wr<relat::single, relat::multi , trans::broadcast>>;
template struct chan_impl<ipc::wr<relat::multi , relat::multi , trans::broadcast>>;
} // namespace ipc

查看文件

@@ -1,25 +0,0 @@
#pragma once
#include <type_traits>
#include "libipc/def.h"
#include "libipc/prod_cons.h"
#include "libipc/circ/elem_array.h"
namespace ipc {
namespace policy {
template <template <typename, std::size_t...> class Elems, typename Flag>
struct choose;
template <typename Flag>
struct choose<circ::elem_array, Flag> {
using flag_t = Flag;
template <std::size_t DataSize, std::size_t AlignSize>
using elems_t = circ::elem_array<ipc::prod_cons_impl<flag_t>, DataSize, AlignSize>;
};
} // namespace policy
} // namespace ipc

查看文件

@@ -1,17 +0,0 @@
#include "libipc/pool_alloc.h"
#include "libipc/memory/resource.h"
namespace ipc {
namespace mem {
void* pool_alloc::alloc(std::size_t size) {
return async_pool_alloc::alloc(size);
}
void pool_alloc::free(void* p, std::size_t size) {
async_pool_alloc::free(p, size);
}
} // namespace mem
} // namespace ipc

查看文件

@@ -1,433 +0,0 @@
#pragma once
#include <atomic>
#include <utility>
#include <cstring>
#include <type_traits>
#include <cstdint>
#include "libipc/def.h"
#include "libipc/platform/detail.h"
#include "libipc/circ/elem_def.h"
#include "libipc/utility/log.h"
#include "libipc/utility/utility.h"
namespace ipc {
////////////////////////////////////////////////////////////////
/// producer-consumer implementation
////////////////////////////////////////////////////////////////
template <typename Flag>
struct prod_cons_impl;
template <>
struct prod_cons_impl<wr<relat::single, relat::single, trans::unicast>> {
template <std::size_t DataSize, std::size_t AlignSize>
struct elem_t {
std::aligned_storage_t<DataSize, AlignSize> data_ {};
};
alignas(cache_line_size) std::atomic<circ::u2_t> rd_; // read index
alignas(cache_line_size) std::atomic<circ::u2_t> wt_; // write index
constexpr circ::u2_t cursor() const noexcept {
return 0;
}
template <typename W, typename F, typename E>
bool push(W* /*wrapper*/, F&& f, E* elems) {
auto cur_wt = circ::index_of(wt_.load(std::memory_order_relaxed));
if (cur_wt == circ::index_of(rd_.load(std::memory_order_acquire) - 1)) {
return false; // full
}
std::forward<F>(f)(&(elems[cur_wt].data_));
wt_.fetch_add(1, std::memory_order_release);
return true;
}
/**
* In single-single-unicast, 'force_push' means 'no reader' or 'the only one reader is dead'.
* So we could just disconnect all connections of receiver, and return false.
*/
template <typename W, typename F, typename E>
bool force_push(W* wrapper, F&&, E*) {
wrapper->elems()->disconnect_receiver(~static_cast<circ::cc_t>(0u));
return false;
}
template <typename W, typename F, typename R, typename E>
bool pop(W* /*wrapper*/, circ::u2_t& /*cur*/, F&& f, R&& out, E* elems) {
auto cur_rd = circ::index_of(rd_.load(std::memory_order_relaxed));
if (cur_rd == circ::index_of(wt_.load(std::memory_order_acquire))) {
return false; // empty
}
std::forward<F>(f)(&(elems[cur_rd].data_));
std::forward<R>(out)(true);
rd_.fetch_add(1, std::memory_order_release);
return true;
}
};
template <>
struct prod_cons_impl<wr<relat::single, relat::multi , trans::unicast>>
: prod_cons_impl<wr<relat::single, relat::single, trans::unicast>> {
template <typename W, typename F, typename E>
bool force_push(W* wrapper, F&&, E*) {
wrapper->elems()->disconnect_receiver(1);
return false;
}
template <typename W, typename F, typename R,
template <std::size_t, std::size_t> class E, std::size_t DS, std::size_t AS>
bool pop(W* /*wrapper*/, circ::u2_t& /*cur*/, F&& f, R&& out, E<DS, AS>* elems) {
byte_t buff[DS];
for (unsigned k = 0;;) {
auto cur_rd = rd_.load(std::memory_order_relaxed);
if (circ::index_of(cur_rd) ==
circ::index_of(wt_.load(std::memory_order_acquire))) {
return false; // empty
}
std::memcpy(buff, &(elems[circ::index_of(cur_rd)].data_), sizeof(buff));
if (rd_.compare_exchange_weak(cur_rd, cur_rd + 1, std::memory_order_release)) {
std::forward<F>(f)(buff);
std::forward<R>(out)(true);
return true;
}
ipc::yield(k);
}
}
};
template <>
struct prod_cons_impl<wr<relat::multi , relat::multi, trans::unicast>>
: prod_cons_impl<wr<relat::single, relat::multi, trans::unicast>> {
using flag_t = std::uint64_t;
template <std::size_t DataSize, std::size_t AlignSize>
struct elem_t {
std::aligned_storage_t<DataSize, AlignSize> data_ {};
std::atomic<flag_t> f_ct_ { 0 }; // commit flag
};
alignas(cache_line_size) std::atomic<circ::u2_t> ct_; // commit index
template <typename W, typename F, typename E>
bool push(W* /*wrapper*/, F&& f, E* elems) {
circ::u2_t cur_ct, nxt_ct;
for (unsigned k = 0;;) {
cur_ct = ct_.load(std::memory_order_relaxed);
if (circ::index_of(nxt_ct = cur_ct + 1) ==
circ::index_of(rd_.load(std::memory_order_acquire))) {
return false; // full
}
if (ct_.compare_exchange_weak(cur_ct, nxt_ct, std::memory_order_acq_rel)) {
break;
}
ipc::yield(k);
}
auto* el = elems + circ::index_of(cur_ct);
std::forward<F>(f)(&(el->data_));
// set flag & try update wt
el->f_ct_.store(~static_cast<flag_t>(cur_ct), std::memory_order_release);
while (1) {
auto cac_ct = el->f_ct_.load(std::memory_order_acquire);
if (cur_ct != wt_.load(std::memory_order_relaxed)) {
return true;
}
if ((~cac_ct) != cur_ct) {
return true;
}
if (!el->f_ct_.compare_exchange_strong(cac_ct, 0, std::memory_order_relaxed)) {
return true;
}
wt_.store(nxt_ct, std::memory_order_release);
cur_ct = nxt_ct;
nxt_ct = cur_ct + 1;
el = elems + circ::index_of(cur_ct);
}
return true;
}
template <typename W, typename F, typename E>
bool force_push(W* wrapper, F&&, E*) {
wrapper->elems()->disconnect_receiver(1);
return false;
}
template <typename W, typename F, typename R,
template <std::size_t, std::size_t> class E, std::size_t DS, std::size_t AS>
bool pop(W* /*wrapper*/, circ::u2_t& /*cur*/, F&& f, R&& out, E<DS, AS>* elems) {
byte_t buff[DS];
for (unsigned k = 0;;) {
auto cur_rd = rd_.load(std::memory_order_relaxed);
auto cur_wt = wt_.load(std::memory_order_acquire);
auto id_rd = circ::index_of(cur_rd);
auto id_wt = circ::index_of(cur_wt);
if (id_rd == id_wt) {
auto* el = elems + id_wt;
auto cac_ct = el->f_ct_.load(std::memory_order_acquire);
if ((~cac_ct) != cur_wt) {
return false; // empty
}
if (el->f_ct_.compare_exchange_weak(cac_ct, 0, std::memory_order_relaxed)) {
wt_.store(cur_wt + 1, std::memory_order_release);
}
k = 0;
}
else {
std::memcpy(buff, &(elems[circ::index_of(cur_rd)].data_), sizeof(buff));
if (rd_.compare_exchange_weak(cur_rd, cur_rd + 1, std::memory_order_release)) {
std::forward<F>(f)(buff);
std::forward<R>(out)(true);
return true;
}
ipc::yield(k);
}
}
}
};
template <>
struct prod_cons_impl<wr<relat::single, relat::multi, trans::broadcast>> {
using rc_t = std::uint64_t;
enum : rc_t {
ep_mask = 0x00000000ffffffffull,
ep_incr = 0x0000000100000000ull
};
template <std::size_t DataSize, std::size_t AlignSize>
struct elem_t {
std::aligned_storage_t<DataSize, AlignSize> data_ {};
std::atomic<rc_t> rc_ { 0 }; // read-counter
};
alignas(cache_line_size) std::atomic<circ::u2_t> wt_; // write index
alignas(cache_line_size) rc_t epoch_ { 0 }; // only one writer
circ::u2_t cursor() const noexcept {
return wt_.load(std::memory_order_acquire);
}
template <typename W, typename F, typename E>
bool push(W* wrapper, F&& f, E* elems) {
E* el;
for (unsigned k = 0;;) {
circ::cc_t cc = wrapper->elems()->connections(std::memory_order_relaxed);
if (cc == 0) return false; // no reader
el = elems + circ::index_of(wt_.load(std::memory_order_relaxed));
// check all consumers have finished reading this element
auto cur_rc = el->rc_.load(std::memory_order_acquire);
circ::cc_t rem_cc = cur_rc & ep_mask;
if ((cc & rem_cc) && ((cur_rc & ~ep_mask) == epoch_)) {
return false; // has not finished yet
}
// consider rem_cc to be 0 here
if (el->rc_.compare_exchange_weak(
cur_rc, epoch_ | static_cast<rc_t>(cc), std::memory_order_release)) {
break;
}
ipc::yield(k);
}
std::forward<F>(f)(&(el->data_));
wt_.fetch_add(1, std::memory_order_release);
return true;
}
template <typename W, typename F, typename E>
bool force_push(W* wrapper, F&& f, E* elems) {
E* el;
epoch_ += ep_incr;
for (unsigned k = 0;;) {
circ::cc_t cc = wrapper->elems()->connections(std::memory_order_relaxed);
if (cc == 0) return false; // no reader
el = elems + circ::index_of(wt_.load(std::memory_order_relaxed));
// check all consumers have finished reading this element
auto cur_rc = el->rc_.load(std::memory_order_acquire);
circ::cc_t rem_cc = cur_rc & ep_mask;
if (cc & rem_cc) {
ipc::log("force_push: k = %u, cc = %u, rem_cc = %u\n", k, cc, rem_cc);
cc = wrapper->elems()->disconnect_receiver(rem_cc); // disconnect all invalid readers
if (cc == 0) return false; // no reader
}
// just compare & exchange
if (el->rc_.compare_exchange_weak(
cur_rc, epoch_ | static_cast<rc_t>(cc), std::memory_order_release)) {
break;
}
ipc::yield(k);
}
std::forward<F>(f)(&(el->data_));
wt_.fetch_add(1, std::memory_order_release);
return true;
}
template <typename W, typename F, typename R, typename E>
bool pop(W* wrapper, circ::u2_t& cur, F&& f, R&& out, E* elems) {
if (cur == cursor()) return false; // acquire
auto* el = elems + circ::index_of(cur++);
std::forward<F>(f)(&(el->data_));
for (unsigned k = 0;;) {
auto cur_rc = el->rc_.load(std::memory_order_acquire);
if ((cur_rc & ep_mask) == 0) {
std::forward<R>(out)(true);
return true;
}
auto nxt_rc = cur_rc & ~static_cast<rc_t>(wrapper->connected_id());
if (el->rc_.compare_exchange_weak(cur_rc, nxt_rc, std::memory_order_release)) {
std::forward<R>(out)((nxt_rc & ep_mask) == 0);
return true;
}
ipc::yield(k);
}
}
};
template <>
struct prod_cons_impl<wr<relat::multi, relat::multi, trans::broadcast>> {
using rc_t = std::uint64_t;
using flag_t = std::uint64_t;
enum : rc_t {
rc_mask = 0x00000000ffffffffull,
ep_mask = 0x00ffffffffffffffull,
ep_incr = 0x0100000000000000ull,
ic_mask = 0xff000000ffffffffull,
ic_incr = 0x0000000100000000ull
};
template <std::size_t DataSize, std::size_t AlignSize>
struct elem_t {
std::aligned_storage_t<DataSize, AlignSize> data_ {};
std::atomic<rc_t > rc_ { 0 }; // read-counter
std::atomic<flag_t> f_ct_ { 0 }; // commit flag
};
alignas(cache_line_size) std::atomic<circ::u2_t> ct_; // commit index
alignas(cache_line_size) std::atomic<rc_t> epoch_ { 0 };
circ::u2_t cursor() const noexcept {
return ct_.load(std::memory_order_acquire);
}
constexpr static rc_t inc_rc(rc_t rc) noexcept {
return (rc & ic_mask) | ((rc + ic_incr) & ~ic_mask);
}
constexpr static rc_t inc_mask(rc_t rc) noexcept {
return inc_rc(rc) & ~rc_mask;
}
template <typename W, typename F, typename E>
bool push(W* wrapper, F&& f, E* elems) {
E* el;
circ::u2_t cur_ct;
rc_t epoch = epoch_.load(std::memory_order_acquire);
for (unsigned k = 0;;) {
circ::cc_t cc = wrapper->elems()->connections(std::memory_order_relaxed);
if (cc == 0) return false; // no reader
el = elems + circ::index_of(cur_ct = ct_.load(std::memory_order_relaxed));
// check all consumers have finished reading this element
auto cur_rc = el->rc_.load(std::memory_order_relaxed);
circ::cc_t rem_cc = cur_rc & rc_mask;
if ((cc & rem_cc) && ((cur_rc & ~ep_mask) == epoch)) {
return false; // has not finished yet
}
else if (!rem_cc) {
auto cur_fl = el->f_ct_.load(std::memory_order_acquire);
if ((cur_fl != cur_ct) && cur_fl) {
return false; // full
}
}
// consider rem_cc to be 0 here
if (el->rc_.compare_exchange_weak(
cur_rc, inc_mask(epoch | (cur_rc & ep_mask)) | static_cast<rc_t>(cc), std::memory_order_relaxed) &&
epoch_.compare_exchange_weak(epoch, epoch, std::memory_order_acq_rel)) {
break;
}
ipc::yield(k);
}
// only one thread/process would touch here at one time
ct_.store(cur_ct + 1, std::memory_order_release);
std::forward<F>(f)(&(el->data_));
// set flag & try update wt
el->f_ct_.store(~static_cast<flag_t>(cur_ct), std::memory_order_release);
return true;
}
template <typename W, typename F, typename E>
bool force_push(W* wrapper, F&& f, E* elems) {
E* el;
circ::u2_t cur_ct;
rc_t epoch = epoch_.fetch_add(ep_incr, std::memory_order_release) + ep_incr;
for (unsigned k = 0;;) {
circ::cc_t cc = wrapper->elems()->connections(std::memory_order_relaxed);
if (cc == 0) return false; // no reader
el = elems + circ::index_of(cur_ct = ct_.load(std::memory_order_relaxed));
// check all consumers have finished reading this element
auto cur_rc = el->rc_.load(std::memory_order_acquire);
circ::cc_t rem_cc = cur_rc & rc_mask;
if (cc & rem_cc) {
ipc::log("force_push: k = %u, cc = %u, rem_cc = %u\n", k, cc, rem_cc);
cc = wrapper->elems()->disconnect_receiver(rem_cc); // disconnect all invalid readers
if (cc == 0) return false; // no reader
}
// just compare & exchange
if (el->rc_.compare_exchange_weak(
cur_rc, inc_mask(epoch | (cur_rc & ep_mask)) | static_cast<rc_t>(cc), std::memory_order_relaxed)) {
if (epoch == epoch_.load(std::memory_order_acquire)) {
break;
}
else if (push(wrapper, std::forward<F>(f), elems)) {
return true;
}
epoch = epoch_.fetch_add(ep_incr, std::memory_order_release) + ep_incr;
}
ipc::yield(k);
}
// only one thread/process would touch here at one time
ct_.store(cur_ct + 1, std::memory_order_release);
std::forward<F>(f)(&(el->data_));
// set flag & try update wt
el->f_ct_.store(~static_cast<flag_t>(cur_ct), std::memory_order_release);
return true;
}
template <typename W, typename F, typename R, typename E, std::size_t N>
bool pop(W* wrapper, circ::u2_t& cur, F&& f, R&& out, E(& elems)[N]) {
auto* el = elems + circ::index_of(cur);
auto cur_fl = el->f_ct_.load(std::memory_order_acquire);
if (cur_fl != ~static_cast<flag_t>(cur)) {
return false; // empty
}
++cur;
std::forward<F>(f)(&(el->data_));
for (unsigned k = 0;;) {
auto cur_rc = el->rc_.load(std::memory_order_acquire);
if ((cur_rc & rc_mask) == 0) {
std::forward<R>(out)(true);
el->f_ct_.store(cur + N - 1, std::memory_order_release);
return true;
}
auto nxt_rc = inc_rc(cur_rc) & ~static_cast<rc_t>(wrapper->connected_id());
bool last_one = false;
if ((last_one = (nxt_rc & rc_mask) == 0)) {
el->f_ct_.store(cur + N - 1, std::memory_order_release);
}
if (el->rc_.compare_exchange_weak(cur_rc, nxt_rc, std::memory_order_release)) {
std::forward<R>(out)(last_one);
return true;
}
ipc::yield(k);
}
}
};
} // namespace ipc

查看文件

@@ -1,216 +0,0 @@
#pragma once
#include <type_traits>
#include <new>
#include <utility> // [[since C++14]]: std::exchange
#include <algorithm>
#include <atomic>
#include <tuple>
#include <thread>
#include <chrono>
#include <string>
#include <cassert> // assert
#include "libipc/def.h"
#include "libipc/shm.h"
#include "libipc/rw_lock.h"
#include "libipc/utility/log.h"
#include "libipc/platform/detail.h"
#include "libipc/circ/elem_def.h"
namespace ipc {
namespace detail {
class queue_conn {
protected:
circ::cc_t connected_ = 0;
shm::handle elems_h_;
template <typename Elems>
Elems* open(char const * name) {
if (name == nullptr || name[0] == '\0') {
ipc::error("fail open waiter: name is empty!\n");
return nullptr;
}
if (!elems_h_.acquire(name, sizeof(Elems))) {
return nullptr;
}
auto elems = static_cast<Elems*>(elems_h_.get());
if (elems == nullptr) {
ipc::error("fail acquire elems: %s\n", name);
return nullptr;
}
elems->init();
return elems;
}
void close() {
elems_h_.release();
}
public:
queue_conn() = default;
queue_conn(const queue_conn&) = delete;
queue_conn& operator=(const queue_conn&) = delete;
bool connected() const noexcept {
return connected_ != 0;
}
circ::cc_t connected_id() const noexcept {
return connected_;
}
template <typename Elems>
auto connect(Elems* elems) noexcept
/*needs 'optional' here*/
-> std::tuple<bool, bool, decltype(std::declval<Elems>().cursor())> {
if (elems == nullptr) return {};
// if it's already connected, just return
if (connected()) return {connected(), false, 0};
connected_ = elems->connect_receiver();
return {connected(), true, elems->cursor()};
}
template <typename Elems>
bool disconnect(Elems* elems) noexcept {
if (elems == nullptr) return false;
// if it's already disconnected, just return false
if (!connected()) return false;
elems->disconnect_receiver(std::exchange(connected_, 0));
return true;
}
};
template <typename Elems>
class queue_base : public queue_conn {
using base_t = queue_conn;
public:
using elems_t = Elems;
using policy_t = typename elems_t::policy_t;
protected:
elems_t * elems_ = nullptr;
decltype(std::declval<elems_t>().cursor()) cursor_ = 0;
bool sender_flag_ = false;
public:
using base_t::base_t;
queue_base() = default;
explicit queue_base(char const * name)
: queue_base{} {
elems_ = open<elems_t>(name);
}
explicit queue_base(elems_t * elems) noexcept
: queue_base{} {
assert(elems != nullptr);
elems_ = elems;
}
/* not virtual */ ~queue_base() {
base_t::close();
}
elems_t * elems() noexcept { return elems_; }
elems_t const * elems() const noexcept { return elems_; }
bool ready_sending() noexcept {
if (elems_ == nullptr) return false;
return sender_flag_ || (sender_flag_ = elems_->connect_sender());
}
void shut_sending() noexcept {
if (elems_ == nullptr) return;
if (!sender_flag_) return;
elems_->disconnect_sender();
}
bool connect() noexcept {
auto tp = base_t::connect(elems_);
if (std::get<0>(tp) && std::get<1>(tp)) {
cursor_ = std::get<2>(tp);
return true;
}
return std::get<0>(tp);
}
bool disconnect() noexcept {
return base_t::disconnect(elems_);
}
std::size_t conn_count() const noexcept {
return (elems_ == nullptr) ? static_cast<std::size_t>(invalid_value) : elems_->conn_count();
}
bool valid() const noexcept {
return elems_ != nullptr;
}
bool empty() const noexcept {
return !valid() || (cursor_ == elems_->cursor());
}
template <typename T, typename F, typename... P>
bool push(F&& prep, P&&... params) {
if (elems_ == nullptr) return false;
return elems_->push(this, [&](void* p) {
if (prep(p)) ::new (p) T(std::forward<P>(params)...);
});
}
template <typename T, typename F, typename... P>
bool force_push(F&& prep, P&&... params) {
if (elems_ == nullptr) return false;
return elems_->force_push(this, [&](void* p) {
if (prep(p)) ::new (p) T(std::forward<P>(params)...);
});
}
template <typename T, typename F>
bool pop(T& item, F&& out) {
if (elems_ == nullptr) {
return false;
}
return elems_->pop(this, &(this->cursor_), [&item](void* p) {
::new (&item) T(std::move(*static_cast<T*>(p)));
}, std::forward<F>(out));
}
};
} // namespace detail
template <typename T, typename Policy>
class queue final : public detail::queue_base<typename Policy::template elems_t<sizeof(T), alignof(T)>> {
using base_t = detail::queue_base<typename Policy::template elems_t<sizeof(T), alignof(T)>>;
public:
using value_t = T;
using base_t::base_t;
template <typename... P>
bool push(P&&... params) {
return base_t::template push<T>(std::forward<P>(params)...);
}
template <typename... P>
bool force_push(P&&... params) {
return base_t::template force_push<T>(std::forward<P>(params)...);
}
bool pop(T& item) {
return base_t::pop(item, [](bool) {});
}
template <typename F>
bool pop(T& item, F&& out) {
return base_t::pop(item, std::forward<F>(out));
}
};
} // namespace ipc

查看文件

@@ -1,103 +0,0 @@
#include <string>
#include <utility>
#include "libipc/shm.h"
#include "libipc/utility/pimpl.h"
#include "libipc/memory/resource.h"
namespace ipc {
namespace shm {
class handle::handle_ : public pimpl<handle_> {
public:
shm::id_t id_ = nullptr;
void* m_ = nullptr;
ipc::string n_;
std::size_t s_ = 0;
};
handle::handle()
: p_(p_->make()) {
}
handle::handle(char const * name, std::size_t size, unsigned mode)
: handle() {
acquire(name, size, mode);
}
handle::handle(handle&& rhs)
: handle() {
swap(rhs);
}
handle::~handle() {
release();
p_->clear();
}
void handle::swap(handle& rhs) {
std::swap(p_, rhs.p_);
}
handle& handle::operator=(handle rhs) {
swap(rhs);
return *this;
}
bool handle::valid() const noexcept {
return impl(p_)->m_ != nullptr;
}
std::size_t handle::size() const noexcept {
return impl(p_)->s_;
}
char const * handle::name() const noexcept {
return impl(p_)->n_.c_str();
}
std::int32_t handle::ref() const noexcept {
return shm::get_ref(impl(p_)->id_);
}
void handle::sub_ref() noexcept {
shm::sub_ref(impl(p_)->id_);
}
bool handle::acquire(char const * name, std::size_t size, unsigned mode) {
release();
impl(p_)->id_ = shm::acquire((impl(p_)->n_ = name).c_str(), size, mode);
impl(p_)->m_ = shm::get_mem(impl(p_)->id_, &(impl(p_)->s_));
return valid();
}
std::int32_t handle::release() {
if (impl(p_)->id_ == nullptr) return -1;
return shm::release(detach());
}
void* handle::get() const {
return impl(p_)->m_;
}
void handle::attach(id_t id) {
if (id == nullptr) return;
release();
impl(p_)->id_ = id;
impl(p_)->m_ = shm::get_mem(impl(p_)->id_, &(impl(p_)->s_));
}
id_t handle::detach() {
auto old = impl(p_)->id_;
impl(p_)->id_ = nullptr;
impl(p_)->m_ = nullptr;
impl(p_)->s_ = 0;
impl(p_)->n_.clear();
return old;
}
} // namespace shm
} // namespace ipc

查看文件

@@ -1,83 +0,0 @@
#pragma once
#include <utility>
#include <string>
#include <mutex>
#include <atomic>
#include "libipc/def.h"
#include "libipc/mutex.h"
#include "libipc/condition.h"
#include "libipc/platform/detail.h"
namespace ipc {
namespace detail {
class waiter {
ipc::sync::condition cond_;
ipc::sync::mutex lock_;
std::atomic<bool> quit_ {false};
public:
static void init();
waiter() = default;
waiter(char const *name) {
open(name);
}
~waiter() {
close();
}
bool valid() const noexcept {
return cond_.valid() && lock_.valid();
}
bool open(char const *name) noexcept {
quit_.store(false, std::memory_order_relaxed);
if (!cond_.open((std::string{"_waiter_cond_"} + name).c_str())) {
return false;
}
if (!lock_.open((std::string{"_waiter_lock_"} + name).c_str())) {
cond_.close();
return false;
}
return valid();
}
void close() noexcept {
cond_.close();
lock_.close();
}
template <typename F>
bool wait_if(F &&pred, std::uint64_t tm = ipc::invalid_value) noexcept {
IPC_UNUSED_ std::lock_guard<ipc::sync::mutex> guard {lock_};
while ([this, &pred] {
return !quit_.load(std::memory_order_relaxed)
&& std::forward<F>(pred)();
}()) {
if (!cond_.wait(lock_, tm)) return false;
}
return true;
}
bool notify() noexcept {
std::lock_guard<ipc::sync::mutex>{lock_}; // barrier
return cond_.notify(lock_);
}
bool broadcast() noexcept {
std::lock_guard<ipc::sync::mutex>{lock_}; // barrier
return cond_.broadcast(lock_);
}
bool quit_waiting() {
quit_.store(true, std::memory_order_release);
return broadcast();
}
};
} // namespace detail
} // namespace ipc

查看文件

@@ -1,3 +0,0 @@
https://github.com/mutouyun/cpp-ipc
A high-performance inter-process communication library using shared memory on Linux/Windows.

文件差异内容过多而无法显示 加载差异

查看文件

@@ -1,316 +0,0 @@
// jpgd.h - C++ class for JPEG decompression.
// Public domain, Rich Geldreich <richgel99@gmail.com>
#ifndef JPEG_DECODER_H
#define JPEG_DECODER_H
#include <stdlib.h>
#include <stdio.h>
#include <setjmp.h>
namespace jpgd
{
typedef unsigned char uint8;
typedef signed short int16;
typedef unsigned short uint16;
typedef unsigned int uint;
typedef signed int int32;
// Loads a JPEG image from a memory buffer or a file.
// req_comps can be 1 (grayscale), 3 (RGB), or 4 (RGBA).
// On return, width/height will be set to the image's dimensions, and actual_comps will be set to the either 1 (grayscale) or 3 (RGB).
// Notes: For more control over where and how the source data is read, see the decompress_jpeg_image_from_stream() function below, or call the jpeg_decoder class directly.
// Requesting a 8 or 32bpp image is currently a little faster than 24bpp because the jpeg_decoder class itself currently always unpacks to either 8 or 32bpp.
// BEGIN EPIC MOD
//unsigned char *decompress_jpeg_image_from_memory(const unsigned char *pSrc_data, int src_data_size, int *width, int *height, int *actual_comps, int req_comps);
unsigned char *decompress_jpeg_image_from_memory(const unsigned char *pSrc_data, int src_data_size, int *width, int *height, int *actual_comps, int req_comps, int format);
// END EPIC MOD
unsigned char *decompress_jpeg_image_from_file(const char *pSrc_filename, int *width, int *height, int *actual_comps, int req_comps);
// Success/failure error codes.
enum jpgd_status
{
JPGD_SUCCESS = 0, JPGD_FAILED = -1, JPGD_DONE = 1,
JPGD_BAD_DHT_COUNTS = -256, JPGD_BAD_DHT_INDEX, JPGD_BAD_DHT_MARKER, JPGD_BAD_DQT_MARKER, JPGD_BAD_DQT_TABLE,
JPGD_BAD_PRECISION, JPGD_BAD_HEIGHT, JPGD_BAD_WIDTH, JPGD_TOO_MANY_COMPONENTS,
JPGD_BAD_SOF_LENGTH, JPGD_BAD_VARIABLE_MARKER, JPGD_BAD_DRI_LENGTH, JPGD_BAD_SOS_LENGTH,
JPGD_BAD_SOS_COMP_ID, JPGD_W_EXTRA_BYTES_BEFORE_MARKER, JPGD_NO_ARITHMITIC_SUPPORT, JPGD_UNEXPECTED_MARKER,
JPGD_NOT_JPEG, JPGD_UNSUPPORTED_MARKER, JPGD_BAD_DQT_LENGTH, JPGD_TOO_MANY_BLOCKS,
JPGD_UNDEFINED_QUANT_TABLE, JPGD_UNDEFINED_HUFF_TABLE, JPGD_NOT_SINGLE_SCAN, JPGD_UNSUPPORTED_COLORSPACE,
JPGD_UNSUPPORTED_SAMP_FACTORS, JPGD_DECODE_ERROR, JPGD_BAD_RESTART_MARKER, JPGD_ASSERTION_ERROR,
JPGD_BAD_SOS_SPECTRAL, JPGD_BAD_SOS_SUCCESSIVE, JPGD_STREAM_READ, JPGD_NOTENOUGHMEM
};
// Input stream interface.
// Derive from this class to read input data from sources other than files or memory. Set m_eof_flag to true when no more data is available.
// The decoder is rather greedy: it will keep on calling this method until its internal input buffer is full, or until the EOF flag is set.
// It the input stream contains data after the JPEG stream's EOI (end of image) marker it will probably be pulled into the internal buffer.
// Call the get_total_bytes_read() method to determine the actual size of the JPEG stream after successful decoding.
class jpeg_decoder_stream
{
public:
jpeg_decoder_stream() { }
virtual ~jpeg_decoder_stream() { }
// The read() method is called when the internal input buffer is empty.
// Parameters:
// pBuf - input buffer
// max_bytes_to_read - maximum bytes that can be written to pBuf
// pEOF_flag - set this to true if at end of stream (no more bytes remaining)
// Returns -1 on error, otherwise return the number of bytes actually written to the buffer (which may be 0).
// Notes: This method will be called in a loop until you set *pEOF_flag to true or the internal buffer is full.
virtual int read(uint8 *pBuf, int max_bytes_to_read, bool *pEOF_flag) = 0;
};
// stdio FILE stream class.
class jpeg_decoder_file_stream : public jpeg_decoder_stream
{
jpeg_decoder_file_stream(const jpeg_decoder_file_stream &);
jpeg_decoder_file_stream &operator =(const jpeg_decoder_file_stream &);
FILE *m_pFile;
bool m_eof_flag, m_error_flag;
public:
jpeg_decoder_file_stream();
virtual ~jpeg_decoder_file_stream();
bool open(const char *Pfilename);
void close();
virtual int read(uint8 *pBuf, int max_bytes_to_read, bool *pEOF_flag);
};
// Memory stream class.
class jpeg_decoder_mem_stream : public jpeg_decoder_stream
{
const uint8 *m_pSrc_data;
uint m_ofs, m_size;
public:
jpeg_decoder_mem_stream() : m_pSrc_data(NULL), m_ofs(0), m_size(0) { }
jpeg_decoder_mem_stream(const uint8 *pSrc_data, uint size) : m_pSrc_data(pSrc_data), m_ofs(0), m_size(size) { }
virtual ~jpeg_decoder_mem_stream() { }
bool open(const uint8 *pSrc_data, uint size);
void close() { m_pSrc_data = NULL; m_ofs = 0; m_size = 0; }
virtual int read(uint8 *pBuf, int max_bytes_to_read, bool *pEOF_flag);
};
// Loads JPEG file from a jpeg_decoder_stream.
unsigned char *decompress_jpeg_image_from_stream(jpeg_decoder_stream *pStream, int *width, int *height, int *actual_comps, int req_comps);
enum
{
JPGD_IN_BUF_SIZE = 8192, JPGD_MAX_BLOCKS_PER_MCU = 10, JPGD_MAX_HUFF_TABLES = 8, JPGD_MAX_QUANT_TABLES = 4,
JPGD_MAX_COMPONENTS = 4, JPGD_MAX_COMPS_IN_SCAN = 4, JPGD_MAX_BLOCKS_PER_ROW = 8192, JPGD_MAX_HEIGHT = 16384, JPGD_MAX_WIDTH = 16384
};
typedef int16 jpgd_quant_t;
typedef int16 jpgd_block_t;
class jpeg_decoder
{
public:
// Call get_error_code() after constructing to determine if the stream is valid or not. You may call the get_width(), get_height(), etc.
// methods after the constructor is called. You may then either destruct the object, or begin decoding the image by calling begin_decoding(), then decode() on each scanline.
jpeg_decoder(jpeg_decoder_stream *pStream);
~jpeg_decoder();
// Call this method after constructing the object to begin decompression.
// If JPGD_SUCCESS is returned you may then call decode() on each scanline.
int begin_decoding();
// Returns the next scan line.
// For grayscale images, pScan_line will point to a buffer containing 8-bit pixels (get_bytes_per_pixel() will return 1).
// Otherwise, it will always point to a buffer containing 32-bit RGBA pixels (A will always be 255, and get_bytes_per_pixel() will return 4).
// Returns JPGD_SUCCESS if a scan line has been returned.
// Returns JPGD_DONE if all scan lines have been returned.
// Returns JPGD_FAILED if an error occurred. Call get_error_code() for a more info.
int decode(const void** pScan_line, uint* pScan_line_len);
inline jpgd_status get_error_code() const { return m_error_code; }
inline int get_width() const { return m_image_x_size; }
inline int get_height() const { return m_image_y_size; }
inline int get_num_components() const { return m_comps_in_frame; }
inline int get_bytes_per_pixel() const { return m_dest_bytes_per_pixel; }
inline int get_bytes_per_scan_line() const { return m_image_x_size * get_bytes_per_pixel(); }
// Returns the total number of bytes actually consumed by the decoder (which should equal the actual size of the JPEG file).
inline int get_total_bytes_read() const { return m_total_bytes_read; }
private:
jpeg_decoder(const jpeg_decoder &);
jpeg_decoder &operator =(const jpeg_decoder &);
typedef void (*pDecode_block_func)(jpeg_decoder *, int, int, int);
struct huff_tables
{
bool ac_table;
uint look_up[256];
uint look_up2[256];
uint8 code_size[256];
uint tree[512];
};
struct coeff_buf
{
uint8 *pData;
int block_num_x, block_num_y;
int block_len_x, block_len_y;
int block_size;
};
struct mem_block
{
mem_block *m_pNext;
size_t m_used_count;
size_t m_size;
char m_data[1];
};
jmp_buf m_jmp_state;
mem_block *m_pMem_blocks;
int m_image_x_size;
int m_image_y_size;
jpeg_decoder_stream *m_pStream;
int m_progressive_flag;
uint8 m_huff_ac[JPGD_MAX_HUFF_TABLES];
uint8* m_huff_num[JPGD_MAX_HUFF_TABLES]; // pointer to number of Huffman codes per bit size
uint8* m_huff_val[JPGD_MAX_HUFF_TABLES]; // pointer to Huffman codes per bit size
jpgd_quant_t* m_quant[JPGD_MAX_QUANT_TABLES]; // pointer to quantization tables
int m_scan_type; // Gray, Yh1v1, Yh1v2, Yh2v1, Yh2v2 (CMYK111, CMYK4114 no longer supported)
int m_comps_in_frame; // # of components in frame
int m_comp_h_samp[JPGD_MAX_COMPONENTS]; // component's horizontal sampling factor
int m_comp_v_samp[JPGD_MAX_COMPONENTS]; // component's vertical sampling factor
int m_comp_quant[JPGD_MAX_COMPONENTS]; // component's quantization table selector
int m_comp_ident[JPGD_MAX_COMPONENTS]; // component's ID
int m_comp_h_blocks[JPGD_MAX_COMPONENTS];
int m_comp_v_blocks[JPGD_MAX_COMPONENTS];
int m_comps_in_scan; // # of components in scan
int m_comp_list[JPGD_MAX_COMPS_IN_SCAN]; // components in this scan
int m_comp_dc_tab[JPGD_MAX_COMPONENTS]; // component's DC Huffman coding table selector
int m_comp_ac_tab[JPGD_MAX_COMPONENTS]; // component's AC Huffman coding table selector
int m_spectral_start; // spectral selection start
int m_spectral_end; // spectral selection end
int m_successive_low; // successive approximation low
int m_successive_high; // successive approximation high
int m_max_mcu_x_size; // MCU's max. X size in pixels
int m_max_mcu_y_size; // MCU's max. Y size in pixels
int m_blocks_per_mcu;
int m_max_blocks_per_row;
int m_mcus_per_row, m_mcus_per_col;
int m_mcu_org[JPGD_MAX_BLOCKS_PER_MCU];
int m_total_lines_left; // total # lines left in image
int m_mcu_lines_left; // total # lines left in this MCU
int m_real_dest_bytes_per_scan_line;
int m_dest_bytes_per_scan_line; // rounded up
int m_dest_bytes_per_pixel; // 4 (RGB) or 1 (Y)
huff_tables* m_pHuff_tabs[JPGD_MAX_HUFF_TABLES];
coeff_buf* m_dc_coeffs[JPGD_MAX_COMPONENTS];
coeff_buf* m_ac_coeffs[JPGD_MAX_COMPONENTS];
int m_eob_run;
int m_block_y_mcu[JPGD_MAX_COMPONENTS];
uint8* m_pIn_buf_ofs;
int m_in_buf_left;
int m_tem_flag;
bool m_eof_flag;
uint8 m_in_buf_pad_start[128];
uint8 m_in_buf[JPGD_IN_BUF_SIZE + 128];
uint8 m_in_buf_pad_end[128];
int m_bits_left;
uint m_bit_buf;
int m_restart_interval;
int m_restarts_left;
int m_next_restart_num;
int m_max_mcus_per_row;
int m_max_blocks_per_mcu;
int m_expanded_blocks_per_mcu;
int m_expanded_blocks_per_row;
int m_expanded_blocks_per_component;
bool m_freq_domain_chroma_upsample;
int m_max_mcus_per_col;
uint m_last_dc_val[JPGD_MAX_COMPONENTS];
jpgd_block_t* m_pMCU_coefficients;
int m_mcu_block_max_zag[JPGD_MAX_BLOCKS_PER_MCU];
uint8* m_pSample_buf;
int m_crr[256];
int m_cbb[256];
int m_crg[256];
int m_cbg[256];
uint8* m_pScan_line_0;
uint8* m_pScan_line_1;
jpgd_status m_error_code;
bool m_ready_flag;
int m_total_bytes_read;
void free_all_blocks();
// BEGIN EPIC MOD
UE_NORETURN void stop_decoding(jpgd_status status);
// END EPIC MOD
void *alloc(size_t n, bool zero = false);
void word_clear(void *p, uint16 c, uint n);
void prep_in_buffer();
void read_dht_marker();
void read_dqt_marker();
void read_sof_marker();
void skip_variable_marker();
void read_dri_marker();
void read_sos_marker();
int next_marker();
int process_markers();
void locate_soi_marker();
void locate_sof_marker();
int locate_sos_marker();
void init(jpeg_decoder_stream * pStream);
void create_look_ups();
void fix_in_buffer();
void transform_mcu(int mcu_row);
void transform_mcu_expand(int mcu_row);
coeff_buf* coeff_buf_open(int block_num_x, int block_num_y, int block_len_x, int block_len_y);
inline jpgd_block_t *coeff_buf_getp(coeff_buf *cb, int block_x, int block_y);
void load_next_row();
void decode_next_row();
void make_huff_table(int index, huff_tables *pH);
void check_quant_tables();
void check_huff_tables();
void calc_mcu_block_order();
int init_scan();
void init_frame();
void process_restart();
void decode_scan(pDecode_block_func decode_block_func);
void init_progressive();
void init_sequential();
void decode_start();
void decode_init(jpeg_decoder_stream * pStream);
void H2V2Convert();
void H2V1Convert();
void H1V2Convert();
void H1V1Convert();
void gray_convert();
void expanded_convert();
void find_eoi();
inline uint get_char();
inline uint get_char(bool *pPadding_flag);
inline void stuff_char(uint8 q);
inline uint8 get_octet();
inline uint get_bits(int num_bits);
inline uint get_bits_no_markers(int numbits);
inline int huff_decode(huff_tables *pH);
inline int huff_decode(huff_tables *pH, int& extrabits);
static inline uint8 clamp(int i);
static void decode_block_dc_first(jpeg_decoder *pD, int component_id, int block_x, int block_y);
static void decode_block_dc_refine(jpeg_decoder *pD, int component_id, int block_x, int block_y);
static void decode_block_ac_first(jpeg_decoder *pD, int component_id, int block_x, int block_y);
static void decode_block_ac_refine(jpeg_decoder *pD, int component_id, int block_x, int block_y);
};
} // namespace jpgd
#endif // JPEG_DECODER_H

文件差异内容过多而无法显示 加载差异

查看文件

@@ -1,172 +0,0 @@
// jpge.h - C++ class for JPEG compression.
// Public domain, Rich Geldreich <richgel99@gmail.com>
// Alex Evans: Added RGBA support, linear memory allocator.
#ifndef JPEG_ENCODER_H
#define JPEG_ENCODER_H
#include <stdint.h>
namespace jpge
{
typedef unsigned char uint8;
typedef signed short int16;
typedef signed int int32;
typedef unsigned short uint16;
typedef unsigned int uint32;
typedef unsigned int uint;
// JPEG chroma subsampling factors. Y_ONLY (grayscale images) and H2V2 (color images) are the most common.
enum subsampling_t { Y_ONLY = 0, H1V1 = 1, H2V1 = 2, H2V2 = 3 };
// JPEG compression parameters structure.
struct params
{
inline params() : m_quality(85), m_subsampling(H2V2), m_no_chroma_discrim_flag(false), m_two_pass_flag(false) { }
inline bool check_valid() const
{
if ((m_quality < 1) || (m_quality > 100)) return false;
if ((uint)m_subsampling > (uint)H2V2) return false;
return true;
}
// Quality: 1-100, higher is better. Typical values are around 50-95.
int m_quality;
// m_subsampling:
// 0 = Y (grayscale) only
// 1 = YCbCr, no subsampling (H1V1, YCbCr 1x1x1, 3 blocks per MCU)
// 2 = YCbCr, H2V1 subsampling (YCbCr 2x1x1, 4 blocks per MCU)
// 3 = YCbCr, H2V2 subsampling (YCbCr 4x1x1, 6 blocks per MCU-- very common)
subsampling_t m_subsampling;
// Disables CbCr discrimination - only intended for testing.
// If true, the Y quantization table is also used for the CbCr channels.
bool m_no_chroma_discrim_flag;
bool m_two_pass_flag;
};
// Writes JPEG image to a file.
// num_channels must be 1 (Y) or 3 (RGB), image pitch must be width*num_channels.
bool compress_image_to_jpeg_file(const char *pFilename, int64_t width, int64_t height, int64_t num_channels, const uint8 *pImage_data, const params &comp_params = params());
// Writes JPEG image to memory buffer.
// On entry, buf_size is the size of the output buffer pointed at by pBuf, which should be at least ~1024 bytes.
// If return value is true, buf_size will be set to the size of the compressed data.
bool compress_image_to_jpeg_file_in_memory(void *pBuf, int64_t &buf_size, int64_t width, int64_t height, int64_t num_channels, const uint8 *pImage_data, const params &comp_params = params());
// Output stream abstract class - used by the jpeg_encoder class to write to the output stream.
// put_buf() is generally called with len==JPGE_OUT_BUF_SIZE bytes, but for headers it'll be called with smaller amounts.
class output_stream
{
public:
virtual ~output_stream() { };
virtual bool put_buf(const void* Pbuf, int64_t len) = 0;
template<class T> inline bool put_obj(const T& obj) { return put_buf(&obj, sizeof(T)); }
};
// Lower level jpeg_encoder class - useful if more control is needed than the above helper functions.
class jpeg_encoder
{
public:
jpeg_encoder();
~jpeg_encoder();
// Initializes the compressor.
// pStream: The stream object to use for writing compressed data.
// params - Compression parameters structure, defined above.
// width, height - Image dimensions.
// channels - May be 1, or 3. 1 indicates grayscale, 3 indicates RGB source data.
// Returns false on out of memory or if a stream write fails.
bool init(output_stream *pStream, int64_t width, int64_t height, int64_t src_channels, const params &comp_params = params());
const params &get_params() const { return m_params; }
// Deinitializes the compressor, freeing any allocated memory. May be called at any time.
void deinit();
uint get_total_passes() const { return m_params.m_two_pass_flag ? 2 : 1; }
inline uint get_cur_pass() { return m_pass_num; }
// Call this method with each source scanline.
// width * src_channels bytes per scanline is expected (RGB or Y format).
// You must call with NULL after all scanlines are processed to finish compression.
// Returns false on out of memory or if a stream write fails.
bool process_scanline(const void* pScanline);
private:
jpeg_encoder(const jpeg_encoder &);
jpeg_encoder &operator =(const jpeg_encoder &);
typedef int32 sample_array_t;
output_stream *m_pStream;
params m_params;
uint8 m_num_components;
uint8 m_comp_h_samp[3], m_comp_v_samp[3];
int m_image_x, m_image_y, m_image_bpp, m_image_bpl;
int m_image_x_mcu, m_image_y_mcu;
int m_image_bpl_xlt, m_image_bpl_mcu;
int m_mcus_per_row;
int m_mcu_x, m_mcu_y;
uint8 *m_mcu_lines[16];
uint8 m_mcu_y_ofs;
sample_array_t m_sample_array[64];
int16 m_coefficient_array[64];
int32 m_quantization_tables[2][64];
uint m_huff_codes[4][256];
uint8 m_huff_code_sizes[4][256];
uint8 m_huff_bits[4][17];
uint8 m_huff_val[4][256];
uint32 m_huff_count[4][256];
int m_last_dc_val[3];
enum { JPGE_OUT_BUF_SIZE = 2048 };
uint8 m_out_buf[JPGE_OUT_BUF_SIZE];
uint8 *m_pOut_buf;
uint m_out_buf_left;
uint32 m_bit_buffer;
uint m_bits_in;
uint8 m_pass_num;
bool m_all_stream_writes_succeeded;
void optimize_huffman_table(int table_num, int table_len);
void emit_byte(uint8 i);
void emit_word(uint i);
void emit_marker(int marker);
void emit_jfif_app0();
void emit_dqt();
void emit_sof();
void emit_dht(uint8 *bits, uint8 *val, int index, bool ac_flag);
void emit_dhts();
void emit_sos();
void emit_markers();
void compute_huffman_table(uint *codes, uint8 *code_sizes, uint8 *bits, uint8 *val);
void compute_quant_table(int32 *dst, int16 *src);
void adjust_quant_table(int32 *dst, int32 *src);
void first_pass_init();
bool second_pass_init();
bool jpg_open(int p_x_res, int p_y_res, int src_channels);
void load_block_8_8_grey(int x);
void load_block_8_8(int x, int y, int c);
void load_block_16_8(int x, int c);
void load_block_16_8_8(int x, int c);
void load_quantized_coefficients(int component_num);
void flush_output_buffer();
void put_bits(uint bits, uint len);
void code_coefficients_pass_one(int component_num);
void code_coefficients_pass_two(int component_num);
void code_block(int component_num);
void process_mcu_row();
bool terminate_pass_one();
bool terminate_pass_two();
bool process_end_of_image();
void load_mcu(const void* src);
void clear();
void init();
};
} // namespace jpge
#endif // JPEG_ENCODER

查看文件

@@ -1,3 +0,0 @@
jpge.h - C++ class for JPEG compression.
Public domain, Rich Geldreich <richgel99@gmail.com>
Alex Evans: Added RGBA support, linear memory allocator.

查看文件

@@ -1,433 +0,0 @@
#pragma once
#include <atomic>
#include <utility>
#include <cstring>
#include <type_traits>
#include <cstdint>
#include "libipc/def.h"
#include "libipc/platform/detail.h"
#include "libipc/circ/elem_def.h"
#include "libipc/utility/log.h"
#include "libipc/utility/utility.h"
namespace ipc {
////////////////////////////////////////////////////////////////
/// producer-consumer implementation
////////////////////////////////////////////////////////////////
template <typename Flag>
struct prod_cons_impl;
template <>
struct prod_cons_impl<wr<relat::single, relat::single, trans::unicast>> {
template <std::size_t DataSize, std::size_t AlignSize>
struct elem_t {
std::aligned_storage_t<DataSize, AlignSize> data_ {};
};
alignas(cache_line_size) std::atomic<circ::u2_t> rd_; // read index
alignas(cache_line_size) std::atomic<circ::u2_t> wt_; // write index
constexpr circ::u2_t cursor() const noexcept {
return 0;
}
template <typename W, typename F, typename E>
bool push(W* /*wrapper*/, F&& f, E* elems) {
auto cur_wt = circ::index_of(wt_.load(std::memory_order_relaxed));
if (cur_wt == circ::index_of(rd_.load(std::memory_order_acquire) - 1)) {
return false; // full
}
std::forward<F>(f)(&(elems[cur_wt].data_));
wt_.fetch_add(1, std::memory_order_release);
return true;
}
/**
* In single-single-unicast, 'force_push' means 'no reader' or 'the only one reader is dead'.
* So we could just disconnect all connections of receiver, and return false.
*/
template <typename W, typename F, typename E>
bool force_push(W* wrapper, F&&, E*) {
wrapper->elems()->disconnect_receiver(~static_cast<circ::cc_t>(0u));
return false;
}
template <typename W, typename F, typename R, typename E>
bool pop(W* /*wrapper*/, circ::u2_t& /*cur*/, F&& f, R&& out, E* elems) {
auto cur_rd = circ::index_of(rd_.load(std::memory_order_relaxed));
if (cur_rd == circ::index_of(wt_.load(std::memory_order_acquire))) {
return false; // empty
}
std::forward<F>(f)(&(elems[cur_rd].data_));
std::forward<R>(out)(true);
rd_.fetch_add(1, std::memory_order_release);
return true;
}
};
template <>
struct prod_cons_impl<wr<relat::single, relat::multi , trans::unicast>>
: prod_cons_impl<wr<relat::single, relat::single, trans::unicast>> {
template <typename W, typename F, typename E>
bool force_push(W* wrapper, F&&, E*) {
wrapper->elems()->disconnect_receiver(1);
return false;
}
template <typename W, typename F, typename R,
template <std::size_t, std::size_t> class E, std::size_t DS, std::size_t AS>
bool pop(W* /*wrapper*/, circ::u2_t& /*cur*/, F&& f, R&& out, E<DS, AS>* elems) {
byte_t buff[DS];
for (unsigned k = 0;;) {
auto cur_rd = rd_.load(std::memory_order_relaxed);
if (circ::index_of(cur_rd) ==
circ::index_of(wt_.load(std::memory_order_acquire))) {
return false; // empty
}
std::memcpy(buff, &(elems[circ::index_of(cur_rd)].data_), sizeof(buff));
if (rd_.compare_exchange_weak(cur_rd, cur_rd + 1, std::memory_order_release)) {
std::forward<F>(f)(buff);
std::forward<R>(out)(true);
return true;
}
ipc::yield(k);
}
}
};
template <>
struct prod_cons_impl<wr<relat::multi , relat::multi, trans::unicast>>
: prod_cons_impl<wr<relat::single, relat::multi, trans::unicast>> {
using flag_t = std::uint64_t;
template <std::size_t DataSize, std::size_t AlignSize>
struct elem_t {
std::aligned_storage_t<DataSize, AlignSize> data_ {};
std::atomic<flag_t> f_ct_ { 0 }; // commit flag
};
alignas(cache_line_size) std::atomic<circ::u2_t> ct_; // commit index
template <typename W, typename F, typename E>
bool push(W* /*wrapper*/, F&& f, E* elems) {
circ::u2_t cur_ct, nxt_ct;
for (unsigned k = 0;;) {
cur_ct = ct_.load(std::memory_order_relaxed);
if (circ::index_of(nxt_ct = cur_ct + 1) ==
circ::index_of(rd_.load(std::memory_order_acquire))) {
return false; // full
}
if (ct_.compare_exchange_weak(cur_ct, nxt_ct, std::memory_order_acq_rel)) {
break;
}
ipc::yield(k);
}
auto* el = elems + circ::index_of(cur_ct);
std::forward<F>(f)(&(el->data_));
// set flag & try update wt
el->f_ct_.store(~static_cast<flag_t>(cur_ct), std::memory_order_release);
while (1) {
auto cac_ct = el->f_ct_.load(std::memory_order_acquire);
if (cur_ct != wt_.load(std::memory_order_relaxed)) {
return true;
}
if ((~cac_ct) != cur_ct) {
return true;
}
if (!el->f_ct_.compare_exchange_strong(cac_ct, 0, std::memory_order_relaxed)) {
return true;
}
wt_.store(nxt_ct, std::memory_order_release);
cur_ct = nxt_ct;
nxt_ct = cur_ct + 1;
el = elems + circ::index_of(cur_ct);
}
return true;
}
template <typename W, typename F, typename E>
bool force_push(W* wrapper, F&&, E*) {
wrapper->elems()->disconnect_receiver(1);
return false;
}
template <typename W, typename F, typename R,
template <std::size_t, std::size_t> class E, std::size_t DS, std::size_t AS>
bool pop(W* /*wrapper*/, circ::u2_t& /*cur*/, F&& f, R&& out, E<DS, AS>* elems) {
byte_t buff[DS];
for (unsigned k = 0;;) {
auto cur_rd = rd_.load(std::memory_order_relaxed);
auto cur_wt = wt_.load(std::memory_order_acquire);
auto id_rd = circ::index_of(cur_rd);
auto id_wt = circ::index_of(cur_wt);
if (id_rd == id_wt) {
auto* el = elems + id_wt;
auto cac_ct = el->f_ct_.load(std::memory_order_acquire);
if ((~cac_ct) != cur_wt) {
return false; // empty
}
if (el->f_ct_.compare_exchange_weak(cac_ct, 0, std::memory_order_relaxed)) {
wt_.store(cur_wt + 1, std::memory_order_release);
}
k = 0;
}
else {
std::memcpy(buff, &(elems[circ::index_of(cur_rd)].data_), sizeof(buff));
if (rd_.compare_exchange_weak(cur_rd, cur_rd + 1, std::memory_order_release)) {
std::forward<F>(f)(buff);
std::forward<R>(out)(true);
return true;
}
ipc::yield(k);
}
}
}
};
template <>
struct prod_cons_impl<wr<relat::single, relat::multi, trans::broadcast>> {
using rc_t = std::uint64_t;
enum : rc_t {
ep_mask = 0x00000000ffffffffull,
ep_incr = 0x0000000100000000ull
};
template <std::size_t DataSize, std::size_t AlignSize>
struct elem_t {
std::aligned_storage_t<DataSize, AlignSize> data_ {};
std::atomic<rc_t> rc_ { 0 }; // read-counter
};
alignas(cache_line_size) std::atomic<circ::u2_t> wt_; // write index
alignas(cache_line_size) rc_t epoch_ { 0 }; // only one writer
circ::u2_t cursor() const noexcept {
return wt_.load(std::memory_order_acquire);
}
template <typename W, typename F, typename E>
bool push(W* wrapper, F&& f, E* elems) {
E* el;
for (unsigned k = 0;;) {
circ::cc_t cc = wrapper->elems()->connections(std::memory_order_relaxed);
if (cc == 0) return false; // no reader
el = elems + circ::index_of(wt_.load(std::memory_order_relaxed));
// check all consumers have finished reading this element
auto cur_rc = el->rc_.load(std::memory_order_acquire);
circ::cc_t rem_cc = cur_rc & ep_mask;
if ((cc & rem_cc) && ((cur_rc & ~ep_mask) == epoch_)) {
return false; // has not finished yet
}
// consider rem_cc to be 0 here
if (el->rc_.compare_exchange_weak(
cur_rc, epoch_ | static_cast<rc_t>(cc), std::memory_order_release)) {
break;
}
ipc::yield(k);
}
std::forward<F>(f)(&(el->data_));
wt_.fetch_add(1, std::memory_order_release);
return true;
}
template <typename W, typename F, typename E>
bool force_push(W* wrapper, F&& f, E* elems) {
E* el;
epoch_ += ep_incr;
for (unsigned k = 0;;) {
circ::cc_t cc = wrapper->elems()->connections(std::memory_order_relaxed);
if (cc == 0) return false; // no reader
el = elems + circ::index_of(wt_.load(std::memory_order_relaxed));
// check all consumers have finished reading this element
auto cur_rc = el->rc_.load(std::memory_order_acquire);
circ::cc_t rem_cc = cur_rc & ep_mask;
if (cc & rem_cc) {
ipc::log("force_push: k = %u, cc = %u, rem_cc = %u\n", k, cc, rem_cc);
cc = wrapper->elems()->disconnect_receiver(rem_cc); // disconnect all invalid readers
if (cc == 0) return false; // no reader
}
// just compare & exchange
if (el->rc_.compare_exchange_weak(
cur_rc, epoch_ | static_cast<rc_t>(cc), std::memory_order_release)) {
break;
}
ipc::yield(k);
}
std::forward<F>(f)(&(el->data_));
wt_.fetch_add(1, std::memory_order_release);
return true;
}
template <typename W, typename F, typename R, typename E>
bool pop(W* wrapper, circ::u2_t& cur, F&& f, R&& out, E* elems) {
if (cur == cursor()) return false; // acquire
auto* el = elems + circ::index_of(cur++);
std::forward<F>(f)(&(el->data_));
for (unsigned k = 0;;) {
auto cur_rc = el->rc_.load(std::memory_order_acquire);
if ((cur_rc & ep_mask) == 0) {
std::forward<R>(out)(true);
return true;
}
auto nxt_rc = cur_rc & ~static_cast<rc_t>(wrapper->connected_id());
if (el->rc_.compare_exchange_weak(cur_rc, nxt_rc, std::memory_order_release)) {
std::forward<R>(out)((nxt_rc & ep_mask) == 0);
return true;
}
ipc::yield(k);
}
}
};
template <>
struct prod_cons_impl<wr<relat::multi, relat::multi, trans::broadcast>> {
using rc_t = std::uint64_t;
using flag_t = std::uint64_t;
enum : rc_t {
rc_mask = 0x00000000ffffffffull,
ep_mask = 0x00ffffffffffffffull,
ep_incr = 0x0100000000000000ull,
ic_mask = 0xff000000ffffffffull,
ic_incr = 0x0000000100000000ull
};
template <std::size_t DataSize, std::size_t AlignSize>
struct elem_t {
std::aligned_storage_t<DataSize, AlignSize> data_ {};
std::atomic<rc_t > rc_ { 0 }; // read-counter
std::atomic<flag_t> f_ct_ { 0 }; // commit flag
};
alignas(cache_line_size) std::atomic<circ::u2_t> ct_; // commit index
alignas(cache_line_size) std::atomic<rc_t> epoch_ { 0 };
circ::u2_t cursor() const noexcept {
return ct_.load(std::memory_order_acquire);
}
constexpr static rc_t inc_rc(rc_t rc) noexcept {
return (rc & ic_mask) | ((rc + ic_incr) & ~ic_mask);
}
constexpr static rc_t inc_mask(rc_t rc) noexcept {
return inc_rc(rc) & ~rc_mask;
}
template <typename W, typename F, typename E>
bool push(W* wrapper, F&& f, E* elems) {
E* el;
circ::u2_t cur_ct;
rc_t epoch = epoch_.load(std::memory_order_acquire);
for (unsigned k = 0;;) {
circ::cc_t cc = wrapper->elems()->connections(std::memory_order_relaxed);
if (cc == 0) return false; // no reader
el = elems + circ::index_of(cur_ct = ct_.load(std::memory_order_relaxed));
// check all consumers have finished reading this element
auto cur_rc = el->rc_.load(std::memory_order_relaxed);
circ::cc_t rem_cc = cur_rc & rc_mask;
if ((cc & rem_cc) && ((cur_rc & ~ep_mask) == epoch)) {
return false; // has not finished yet
}
else if (!rem_cc) {
auto cur_fl = el->f_ct_.load(std::memory_order_acquire);
if ((cur_fl != cur_ct) && cur_fl) {
return false; // full
}
}
// consider rem_cc to be 0 here
if (el->rc_.compare_exchange_weak(
cur_rc, inc_mask(epoch | (cur_rc & ep_mask)) | static_cast<rc_t>(cc), std::memory_order_relaxed) &&
epoch_.compare_exchange_weak(epoch, epoch, std::memory_order_acq_rel)) {
break;
}
ipc::yield(k);
}
// only one thread/process would touch here at one time
ct_.store(cur_ct + 1, std::memory_order_release);
std::forward<F>(f)(&(el->data_));
// set flag & try update wt
el->f_ct_.store(~static_cast<flag_t>(cur_ct), std::memory_order_release);
return true;
}
template <typename W, typename F, typename E>
bool force_push(W* wrapper, F&& f, E* elems) {
E* el;
circ::u2_t cur_ct;
rc_t epoch = epoch_.fetch_add(ep_incr, std::memory_order_release) + ep_incr;
for (unsigned k = 0;;) {
circ::cc_t cc = wrapper->elems()->connections(std::memory_order_relaxed);
if (cc == 0) return false; // no reader
el = elems + circ::index_of(cur_ct = ct_.load(std::memory_order_relaxed));
// check all consumers have finished reading this element
auto cur_rc = el->rc_.load(std::memory_order_acquire);
circ::cc_t rem_cc = cur_rc & rc_mask;
if (cc & rem_cc) {
ipc::log("force_push: k = %u, cc = %u, rem_cc = %u\n", k, cc, rem_cc);
cc = wrapper->elems()->disconnect_receiver(rem_cc); // disconnect all invalid readers
if (cc == 0) return false; // no reader
}
// just compare & exchange
if (el->rc_.compare_exchange_weak(
cur_rc, inc_mask(epoch | (cur_rc & ep_mask)) | static_cast<rc_t>(cc), std::memory_order_relaxed)) {
if (epoch == epoch_.load(std::memory_order_acquire)) {
break;
}
else if (push(wrapper, std::forward<F>(f), elems)) {
return true;
}
epoch = epoch_.fetch_add(ep_incr, std::memory_order_release) + ep_incr;
}
ipc::yield(k);
}
// only one thread/process would touch here at one time
ct_.store(cur_ct + 1, std::memory_order_release);
std::forward<F>(f)(&(el->data_));
// set flag & try update wt
el->f_ct_.store(~static_cast<flag_t>(cur_ct), std::memory_order_release);
return true;
}
template <typename W, typename F, typename R, typename E, std::size_t N>
bool pop(W* wrapper, circ::u2_t& cur, F&& f, R&& out, E(& elems)[N]) {
auto* el = elems + circ::index_of(cur);
auto cur_fl = el->f_ct_.load(std::memory_order_acquire);
if (cur_fl != ~static_cast<flag_t>(cur)) {
return false; // empty
}
++cur;
std::forward<F>(f)(&(el->data_));
for (unsigned k = 0;;) {
auto cur_rc = el->rc_.load(std::memory_order_acquire);
if ((cur_rc & rc_mask) == 0) {
std::forward<R>(out)(true);
el->f_ct_.store(cur + N - 1, std::memory_order_release);
return true;
}
auto nxt_rc = inc_rc(cur_rc) & ~static_cast<rc_t>(wrapper->connected_id());
bool last_one = false;
if ((last_one = (nxt_rc & rc_mask) == 0)) {
el->f_ct_.store(cur + N - 1, std::memory_order_release);
}
if (el->rc_.compare_exchange_weak(cur_rc, nxt_rc, std::memory_order_release)) {
std::forward<R>(out)(last_one);
return true;
}
ipc::yield(k);
}
}
};
} // namespace ipc

查看文件

@@ -1,58 +0,0 @@
The goal of reducing sequential computation also forms the foundation of the Extended Neural GPU \citep{extendedngpu}, ByteNet \citep{NalBytenet2017} and ConvS2S \citep{JonasFaceNet2017}, all of which use convolutional neural networks as basic building block, computing hidden representations in parallel for all input and output positions. In these models, the number of operations required to relate signals from two arbitrary input or output positions grows in the distance between positions, linearly for ConvS2S and logarithmically for ByteNet. This makes it more difficult to learn dependencies between distant positions \citep{hochreiter2001gradient}. In the Transformer this is reduced to a constant number of operations, albeit at the cost of reduced effective resolution due to averaging attention-weighted positions, an effect we counteract with Multi-Head Attention as described in section~\ref{sec:attention}.
Self-attention, sometimes called intra-attention is an attention mechanism relating different positions of a single sequence in order to compute a representation of the sequence. Self-attention has been used successfully in a variety of tasks including reading comprehension, abstractive summarization, textual entailment and learning task-independent sentence representations \citep{cheng2016long, decomposableAttnModel, paulus2017deep, lin2017structured}.
End-to-end memory networks are based on a recurrent attention mechanism instead of sequence-aligned recurrence and have been shown to perform well on simple-language question answering and language modeling tasks \citep{sukhbaatar2015}.
To the best of our knowledge, however, the Transformer is the first transduction model relying entirely on self-attention to compute representations of its input and output without using sequence-aligned RNNs or convolution.
In the following sections, we will describe the Transformer, motivate self-attention and discuss its advantages over models such as \citep{neural_gpu, NalBytenet2017} and \citep{JonasFaceNet2017}.
%\citep{JonasFaceNet2017} report new SOTA on machine translation for English-to-German (EnDe), Enlish-to-French (EnFr) and English-to-Romanian language pairs.
%For example,! in MT, we must draw information from both input and previous output words to translate an output word accurately. An attention layer \citep{bahdanau2014neural} can connect a very large number of positions at low computation cost, making it an essential ingredient in competitive recurrent models for machine translation.
%A natural question to ask then is, "Could we replace recurrence with attention?". \marginpar{Don't know if it's the most natural question to ask given the previous statements. Also, need to say that the complexity table summarizes these statements} Such a model would be blessed with the computational efficiency of attention and the power of cross-positional communication. In this work, show that pure attention models work remarkably well for MT, achieving new SOTA results on EnDe and EnFr, and can be trained in under $2$ days on xyz architecture.
%After the seminal models introduced in \citep{sutskever14, bahdanau2014neural, cho2014learning}, recurrent models have become the dominant solution for both sequence modeling and sequence-to-sequence transduction. Many efforts such as \citep{wu2016google,luong2015effective,jozefowicz2016exploring} have pushed the boundaries of machine translation (MT) and language modeling with recurrent endoder-decoder and recurrent language models. Recent effort \citep{shazeer2017outrageously} has successfully combined the power of conditional computation with sequence models to train very large models for MT, pushing SOTA at lower computational cost.
%Recurrent models compute a vector of hidden states $h_t$, for each time step $t$ of computation. $h_t$ is a function of both the input at time $t$ and the previous hidden state $h_t$. This dependence on the previous hidden state precludes processing all timesteps at once, instead requiring long sequences of sequential operations. In practice, this results in greatly reduced computational efficiency, as on modern computing hardware, a single operation on a large batch is much faster than a large number of operations on small batches. The problem gets worse at longer sequence lengths. Although sequential computation is not a severe bottleneck at inference time, as autoregressively generating each output requires all previous outputs, the inability to compute scores at all output positions at once hinders us from rapidly training our models over large datasets. Although impressive work such as \citep{Kuchaiev2017Factorization} is able to significantly accelerate the training of LSTMs with factorization tricks, we are still bound by the linear dependence on sequence length.
%If the model could compute hidden states at each time step using only the inputs and outputs, it would be liberated from the dependence on results from previous time steps during training. This line of thought is the foundation of recent efforts such as the Markovian neural GPU \citep{neural_gpu}, ByteNet \citep{NalBytenet2017} and ConvS2S \citep{JonasFaceNet2017}, all of which use convolutional neural networks as a building block to compute hidden representations simultaneously for all timesteps, resulting in $O(1)$ sequential time complexity. \citep{JonasFaceNet2017} report new SOTA on machine translation for English-to-German (EnDe), Enlish-to-French (EnFr) and English-to-Romanian language pairs.
%A crucial component for accurate sequence prediction is modeling cross-positional communication. For example, in MT, we must draw information from both input and previous output words to translate an output word accurately. An attention layer \citep{bahdanau2014neural} can connect a very large number of positions at a low computation cost, also $O(1)$ sequential time complexity, making it an essential ingredient in recurrent encoder-decoder architectures for MT. A natural question to ask then is, "Could we replace recurrence with attention?". \marginpar{Don't know if it's the most natural question to ask given the previous statements. Also, need to say that the complexity table summarizes these statements} Such a model would be blessed with the computational efficiency of attention and the power of cross-positional communication. In this work, show that pure attention models work remarkably well for MT, achieving new SOTA results on EnDe and EnFr, and can be trained in under $2$ days on xyz architecture.
%Note: Facebook model is no better than RNNs in this regard, since it requires a number of layers proportional to the distance you want to communicate. Bytenet is more promising, since it requires a logarithmnic number of layers (does bytenet have SOTA results)?
%Note: An attention layer can connect a very large number of positions at a low computation cost in O(1) sequential operations. This is why encoder-decoder attention has been so successful in seq-to-seq models so far. It is only natural, then, to also use attention to connect the timesteps of the same sequence.
%Note: I wouldn't say that long sequences are not a problem during inference. It would be great if we could infer with no long sequences. We could just say later on that, while our training graph is constant-depth, our model still requires sequential operations in the decoder part during inference due to the autoregressive nature of the model.
%\begin{table}[h!]
%\caption{Attention models are quite efficient for cross-positional communications when sequence length is smaller than channel depth. $n$ represents the sequence length and $d$ represents the channel depth.}
%\label{tab:op_complexities}
%\begin{center}
%\vspace{-5pt}
%\scalebox{0.75}{
%\begin{tabular}{l|c|c|c}
%\hline \hline
%Layer Type & Receptive & Complexity & Sequential \\
% & Field & & Operations \\
%\hline
%Pointwise Feed-Forward & $1$ & $O(n \cdot d^2)$ & $O(1)$ \\
%\hline
%Recurrent & $n$ & $O(n \cdot d^2)$ & $O(n)$ \\
%\hline
%Convolutional & $r$ & $O(r \cdot n \cdot d^2)$ & $O(1)$ \\
%\hline
%Convolutional (separable) & $r$ & $O(r \cdot n \cdot d + n %\cdot d^2)$ & $O(1)$ \\
%\hline
%Attention & $r$ & $O(r \cdot n \cdot d)$ & $O(1)$ \\
%\hline \hline
%\end{tabular}
%}
%\end{center}
%\end{table}

查看文件

@@ -1,18 +0,0 @@
Recurrent neural networks, long short-term memory \citep{hochreiter1997} and gated recurrent \citep{gruEval14} neural networks in particular, have been firmly established as state of the art approaches in sequence modeling and transduction problems such as language modeling and machine translation \citep{sutskever14, bahdanau2014neural, cho2014learning}. Numerous efforts have since continued to push the boundaries of recurrent language models and encoder-decoder architectures \citep{wu2016google,luong2015effective,jozefowicz2016exploring}.
Recurrent models typically factor computation along the symbol positions of the input and output sequences. Aligning the positions to steps in computation time, they generate a sequence of hidden states $h_t$, as a function of the previous hidden state $h_{t-1}$ and the input for position $t$. This inherently sequential nature precludes parallelization within training examples, which becomes critical at longer sequence lengths, as memory constraints limit batching across examples.
%\marginpar{not sure if the memory constraints are understandable here}
Recent work has achieved significant improvements in computational efficiency through factorization tricks \citep{Kuchaiev2017Factorization} and conditional computation \citep{shazeer2017outrageously}, while also improving model performance in case of the latter. The fundamental constraint of sequential computation, however, remains.
%\marginpar{@all: there is work on analyzing what attention really does in seq2seq models, couldn't find it right away}
Attention mechanisms have become an integral part of compelling sequence modeling and transduction models in various tasks, allowing modeling of dependencies without regard to their distance in the input or output sequences \citep{bahdanau2014neural, structuredAttentionNetworks}. In all but a few cases \citep{decomposableAttnModel}, however, such attention mechanisms are used in conjunction with a recurrent network.
%\marginpar{not sure if "cross-positional communication" is understandable without explanation}
%\marginpar{insert exact training times and stats for the model that reaches sota earliest, maybe even a single GPU model?}
In this work we propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output. The Transformer allows for significantly more parallelization and can reach a new state of the art in translation quality after being trained for as little as twelve hours on eight P100 GPUs.
%\marginpar{you removed the constant number of repetitions part. I wrote it because I wanted to make it clear that the model does not only perform attention once, while it's also not recurrent. I thought that might be important to get across early.}
% Just a standard paragraph with citations, rewrite.
%After the seminal papers of \citep{sutskever14}, \citep{bahdanau2014neural}, and \citep{cho2014learning}, recurrent models have become the dominant solution for both sequence modeling and sequence-to-sequence transduction. Many efforts such as \citep{wu2016google,luong2015effective,jozefowicz2016exploring} have pushed the boundaries of machine translation and language modeling with recurrent sequence models. Recent effort \citep{shazeer2017outrageously} has combined the power of conditional computation with sequence models to train very large models for machine translation, pushing SOTA at lower computational cost. Recurrent models compute a vector of hidden states $h_t$, for each time step $t$ of computation. $h_t$ is a function of both the input at time $t$ and the previous hidden state $h_t$. This dependence on the previous hidden state encumbers recurrnet models to process multiple inputs at once, and their time complexity is a linear function of the length of the input and output, both during training and inference. [What I want to say here is that although this is fine during decoding, at training time, we are given both input and output and this linear nature does not allow the RNN to process all inputs and outputs simultaneously and haven't been used on datasets that are the of the scale of the web. What's the largest dataset we have ? . Talk about Nividia and possibly other's effors to speed up things, and possibly other efforts that alleviate this, but are still limited by it's comptuational nature]. Rest of the intro: What if you could construct the state based on the actual inputs and outputs, then you could construct them all at once. This has been the foundation of many promising recent efforts, bytenet,facenet (Also talk about quasi rnn here). Now we talk about attention!! Along with cell architectures such as long short-term meory (LSTM) \citep{hochreiter1997}, and gated recurrent units (GRUs) \citep{cho2014learning}, attention has emerged as an essential ingredient in successful sequence models, in particular for machine translation. In recent years, many, if not all, state-of-the-art (SOTA) results in machine translation have been achieved with attention-based sequence models \citep{wu2016google,luong2015effective,jozefowicz2016exploring}. Talk about the neon work on how it played with attention to do self attention! Then talk about what we do.

查看文件

@@ -1,155 +0,0 @@
\begin{figure}
\centering
\includegraphics[scale=0.6]{Figures/ModalNet-21}
\caption{The Transformer - model architecture.}
\label{fig:model-arch}
\end{figure}
% Although the primary workhorse of our model is attention,
%Our model maintains the encoder-decoder structure that is common to many so-called sequence-to-sequence models \citep{bahdanau2014neural,sutskever14}. As in all such architectures, the encoder computes a representation of the input sequence, and the decoder consumes these representations along with the output tokens to autoregressively produce the output sequence. Where, traditionally, the encoder and decoder contain stacks of recurrent or convolutional layers, our encoder and decoder stacks are composed of attention layers and position-wise feed-forward layers (Figure~\ref{fig:model-arch}). The following sections describe the gross architecture and these particular components in detail.
Most competitive neural sequence transduction models have an encoder-decoder structure \citep{cho2014learning,bahdanau2014neural,sutskever14}. Here, the encoder maps an input sequence of symbol representations $(x_1, ..., x_n)$ to a sequence of continuous representations $\mathbf{z} = (z_1, ..., z_n)$. Given $\mathbf{z}$, the decoder then generates an output sequence $(y_1,...,y_m)$ of symbols one element at a time. At each step the model is auto-regressive \citep{graves2013generating}, consuming the previously generated symbols as additional input when generating the next.
The Transformer follows this overall architecture using stacked self-attention and point-wise, fully connected layers for both the encoder and decoder, shown in the left and right halves of Figure~\ref{fig:model-arch}, respectively.
\subsection{Encoder and Decoder Stacks}
\paragraph{Encoder:}The encoder is composed of a stack of $N=6$ identical layers. Each layer has two sub-layers. The first is a multi-head self-attention mechanism, and the second is a simple, position-wise fully connected feed-forward network. We employ a residual connection \citep{he2016deep} around each of the two sub-layers, followed by layer normalization \cite{layernorm2016}. That is, the output of each sub-layer is $\mathrm{LayerNorm}(x + \mathrm{Sublayer}(x))$, where $\mathrm{Sublayer}(x)$ is the function implemented by the sub-layer itself. To facilitate these residual connections, all sub-layers in the model, as well as the embedding layers, produce outputs of dimension $\dmodel=512$.
\paragraph{Decoder:}The decoder is also composed of a stack of $N=6$ identical layers. In addition to the two sub-layers in each encoder layer, the decoder inserts a third sub-layer, which performs multi-head attention over the output of the encoder stack. Similar to the encoder, we employ residual connections around each of the sub-layers, followed by layer normalization. We also modify the self-attention sub-layer in the decoder stack to prevent positions from attending to subsequent positions. This masking, combined with fact that the output embeddings are offset by one position, ensures that the predictions for position $i$ can depend only on the known outputs at positions less than $i$.
% In our model (Figure~\ref{fig:model-arch}), the encoder and decoder are composed of stacks of alternating self-attention layers (for cross-positional communication) and position-wise feed-forward layers (for in-place computation). In addition, the decoder stack contains encoder-decoder attention layers. Since attention is agnostic to the distances between words, our model requires a "positional encoding" to be added to the encoder and decoder input. The following sections describe all of these components in detail.
\subsection{Attention} \label{sec:attention}
An attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The output is computed as a weighted sum of the values, where the weight assigned to each value is computed by a compatibility function of the query with the corresponding key.
\subsubsection{Scaled Dot-Product Attention} \label{sec:scaled-dot-prod}
% \begin{figure}
% \centering
% \includegraphics[scale=0.6]{Figures/ModalNet-19}
% \caption{Scaled Dot-Product Attention.}
% \label{fig:multi-head-att}
% \end{figure}
We call our particular attention "Scaled Dot-Product Attention" (Figure~\ref{fig:multi-head-att}). The input consists of queries and keys of dimension $d_k$, and values of dimension $d_v$. We compute the dot products of the query with all keys, divide each by $\sqrt{d_k}$, and apply a softmax function to obtain the weights on the values.
In practice, we compute the attention function on a set of queries simultaneously, packed together into a matrix $Q$. The keys and values are also packed together into matrices $K$ and $V$. We compute the matrix of outputs as:
\begin{equation}
\mathrm{Attention}(Q, K, V) = \mathrm{softmax}(\frac{QK^T}{\sqrt{d_k}})V
\end{equation}
The two most commonly used attention functions are additive attention \citep{bahdanau2014neural}, and dot-product (multiplicative) attention. Dot-product attention is identical to our algorithm, except for the scaling factor of $\frac{1}{\sqrt{d_k}}$. Additive attention computes the compatibility function using a feed-forward network with a single hidden layer. While the two are similar in theoretical complexity, dot-product attention is much faster and more space-efficient in practice, since it can be implemented using highly optimized matrix multiplication code.
%We scale the dot products by $1/\sqrt{d_k}$ to limit the magnitude of the dot products, which works well in practice. Otherwise, we found applying the softmax to often result in weights very close to 0 or 1, and hence minuscule gradients.
% Already described in the subsequent section
%When used as part of decoder self-attention, an optional mask function is applied just before the softmax to prevent positions from attending to subsequent positions. This mask simply sets the logits corresponding to all illegal connections (those outside of the lower triangle) to $-\infty$.
%\paragraph{Comparison to Additive Attention: } We choose dot product attention over additive attention \citep{bahdanau2014neural} since it can be computed using highly optimized matrix multiplication code. This optimization is particularly important to us, as we employ many attention layers in our model.
While for small values of $d_k$ the two mechanisms perform similarly, additive attention outperforms dot product attention without scaling for larger values of $d_k$ \citep{DBLP:journals/corr/BritzGLL17}. We suspect that for large values of $d_k$, the dot products grow large in magnitude, pushing the softmax function into regions where it has extremely small gradients \footnote{To illustrate why the dot products get large, assume that the components of $q$ and $k$ are independent random variables with mean $0$ and variance $1$. Then their dot product, $q \cdot k = \sum_{i=1}^{d_k} q_ik_i$, has mean $0$ and variance $d_k$.}. To counteract this effect, we scale the dot products by $\frac{1}{\sqrt{d_k}}$.
%We suspect this to be caused by the dot products growing too large in magnitude to result in useful gradients after applying the softmax function. To counteract this, we scale the dot product by $1/\sqrt{d_k}$.
\subsubsection{Multi-Head Attention} \label{sec:multihead}
\begin{figure}
\begin{minipage}[t]{0.5\textwidth}
\centering
Scaled Dot-Product Attention \\
\vspace{0.5cm}
\includegraphics[scale=0.6]{Figures/ModalNet-19}
\end{minipage}
\begin{minipage}[t]{0.5\textwidth}
\centering
Multi-Head Attention \\
\vspace{0.1cm}
\includegraphics[scale=0.6]{Figures/ModalNet-20}
\end{minipage}
% \centering
\caption{(left) Scaled Dot-Product Attention. (right) Multi-Head Attention consists of several attention layers running in parallel.}
\label{fig:multi-head-att}
\end{figure}
Instead of performing a single attention function with $\dmodel$-dimensional keys, values and queries, we found it beneficial to linearly project the queries, keys and values $h$ times with different, learned linear projections to $d_k$, $d_k$ and $d_v$ dimensions, respectively.
On each of these projected versions of queries, keys and values we then perform the attention function in parallel, yielding $d_v$-dimensional output values. These are concatenated and once again projected, resulting in the final values, as depicted in Figure~\ref{fig:multi-head-att}.
Multi-head attention allows the model to jointly attend to information from different representation subspaces at different positions. With a single attention head, averaging inhibits this.
\begin{align*}
\mathrm{MultiHead}(Q, K, V) &= \mathrm{Concat}(\mathrm{head_1}, ..., \mathrm{head_h})W^O\\
% \mathrm{where} \mathrm{head_i} &= \mathrm{Attention}(QW_Q_i^{\dmodel \times d_q}, KW_K_i^{\dmodel \times d_k}, VW^V_i^{\dmodel \times d_v})\\
\text{where}~\mathrm{head_i} &= \mathrm{Attention}(QW^Q_i, KW^K_i, VW^V_i)\\
\end{align*}
Where the projections are parameter matrices $W^Q_i \in \mathbb{R}^{\dmodel \times d_k}$, $W^K_i \in \mathbb{R}^{\dmodel \times d_k}$, $W^V_i \in \mathbb{R}^{\dmodel \times d_v}$ and $W^O \in \mathbb{R}^{hd_v \times \dmodel}$.
%find it better (and no more expensive) to have multiple parallel attention layers (each over the full set of positions) with proportionally lower-dimensional keys, values and queries. We call this "Multi-Head Attention" (Figure~\ref{fig:multi-head-att}). The keys, values, and queries for each of these parallel attention layers are computed by learned linear transformations of the inputs to the multi-head attention. We use different linear transformations across different parallel attention layers. The output of the parallel attention layers are concatenated, and then passed through a final learned linear transformation.
In this work we employ $h=8$ parallel attention layers, or heads. For each of these we use $d_k=d_v=\dmodel/h=64$.
Due to the reduced dimension of each head, the total computational cost is similar to that of single-head attention with full dimensionality.
\subsubsection{Applications of Attention in our Model}
The Transformer uses multi-head attention in three different ways:
\begin{itemize}
\item In "encoder-decoder attention" layers, the queries come from the previous decoder layer, and the memory keys and values come from the output of the encoder. This allows every position in the decoder to attend over all positions in the input sequence. This mimics the typical encoder-decoder attention mechanisms in sequence-to-sequence models such as \citep{wu2016google, bahdanau2014neural,JonasFaceNet2017}.
\item The encoder contains self-attention layers. In a self-attention layer all of the keys, values and queries come from the same place, in this case, the output of the previous layer in the encoder. Each position in the encoder can attend to all positions in the previous layer of the encoder.
\item Similarly, self-attention layers in the decoder allow each position in the decoder to attend to all positions in the decoder up to and including that position. We need to prevent leftward information flow in the decoder to preserve the auto-regressive property. We implement this inside of scaled dot-product attention by masking out (setting to $-\infty$) all values in the input of the softmax which correspond to illegal connections. See Figure~\ref{fig:multi-head-att}.
\end{itemize}
\subsection{Position-wise Feed-Forward Networks}\label{sec:ffn}
In addition to attention sub-layers, each of the layers in our encoder and decoder contains a fully connected feed-forward network, which is applied to each position separately and identically. This consists of two linear transformations with a ReLU activation in between.
\begin{equation}
\mathrm{FFN}(x)=\max(0, xW_1 + b_1) W_2 + b_2
\end{equation}
While the linear transformations are the same across different positions, they use different parameters from layer to layer. Another way of describing this is as two convolutions with kernel size 1. The dimensionality of input and output is $\dmodel=512$, and the inner-layer has dimensionality $d_{ff}=2048$.
%In the appendix, we describe how the position-wise feed-forward network can also be seen as a form of attention.
%from Jakob: The number of operations required for the model to relate signals from two arbitrary input or output positions grows in the distance between positions in input or output, linearly for ConvS2S and logarithmically for ByteNet, making it harder to learn dependencies between these positions \citep{hochreiter2001gradient}. In the transformer this is reduced to a constant number of operations, albeit at the cost of effective resolution caused by averaging attention-weighted positions, an effect we aim to counteract with multi-headed attention.
%Figure~\ref{fig:simple-att} presents a simple attention function, $A$, with a single head, that forms the basis of our multi-head attention. $A$ takes a query key vector $\kq$, matrices of memory keys $\km$ and memory values $\vm$ ,and produces a query value vector $\vq$ as
%\begin{equation*} \label{eq:attention}
% A(\kq, \km, \vm) = {\vm}^T (Softmax(\km \kq).
%\end{equation*}
%We linearly transform $\kq,\,\km$, and $\vm$ with learned matrices ${\Wkq \text{,} \, \Wkm}$, and ${\Wvm}$ before calling the attention function, and transform the output query with $\Wvq$ before handing it to the feed forward layer. Each attention layer has it's own set of transformation matrices, which are shared across all query positions. $A$ is applied in parallel for each query position, and is implemented very efficiently as a batch of matrix multiplies. The self-attention and encoder-decoder attention layers use $A$, but with different arguments. For example, in encdoder self-attention, queries in encoder layer $i$ attention to memories in encoder layer $i-1$. To ensure that decoder self-attention layers do not look at future words, we add $- \inf$ to the softmax logits in positions $j+1$ to query length for query position $l$.
%In simple attention, the query value is a weighted combination of the memory values where the attention weights sum to one. Although this function performs well in practice, the constraint on attention weights can restrict the amount of information that flows from memories to queries because the query cannot focus on multiple memory positions at once, which might be desirable when translating long sequences. \marginpar{@usz, could you think of an example of this ?} We remedy this by maintaining multiple attention heads at each query position that attend to all memory positions in parallel, with a different set of parameters per attention head $h$.
%\marginpar{}
\subsection{Embeddings and Softmax}
Similarly to other sequence transduction models, we use learned embeddings to convert the input tokens and output tokens to vectors of dimension $\dmodel$. We also use the usual learned linear transformation and softmax function to convert the decoder output to predicted next-token probabilities. In our model, we share the same weight matrix between the two embedding layers and the pre-softmax linear transformation, similar to \citep{press2016using}. In the embedding layers, we multiply those weights by $\sqrt{\dmodel}$.
\subsection{Positional Encoding}
Since our model contains no recurrence and no convolution, in order for the model to make use of the order of the sequence, we must inject some information about the relative or absolute position of the tokens in the sequence. To this end, we add "positional encodings" to the input embeddings at the bottoms of the encoder and decoder stacks. The positional encodings have the same dimension $\dmodel$ as the embeddings, so that the two can be summed. There are many choices of positional encodings, learned and fixed \citep{JonasFaceNet2017}.
In this work, we use sine and cosine functions of different frequencies:
\begin{align*}
PE_{(pos,2i)} = sin(pos / 10000^{2i/\dmodel}) \\
PE_{(pos,2i+1)} = cos(pos / 10000^{2i/\dmodel})
\end{align*}
where $pos$ is the position and $i$ is the dimension. That is, each dimension of the positional encoding corresponds to a sinusoid. The wavelengths form a geometric progression from $2\pi$ to $10000 \cdot 2\pi$. We chose this function because we hypothesized it would allow the model to easily learn to attend by relative positions, since for any fixed offset $k$, $PE_{pos+k}$ can be represented as a linear function of $PE_{pos}$.
We also experimented with using learned positional embeddings \citep{JonasFaceNet2017} instead, and found that the two versions produced nearly identical results (see Table~\ref{tab:variations} row (E)). We chose the sinusoidal version because it may allow the model to extrapolate to sequence lengths longer than the ones encountered during training.

查看文件

@@ -1,45 +0,0 @@
\pagebreak
\section*{Two Feed-Forward Layers = Attention over Parameters}\label{sec:parameter_attention}
In addition to attention layers, our model contains position-wise feed-forward networks (Section \ref{sec:ffn}), which consist of two linear transformations with a ReLU activation in between. In fact, these networks too can be seen as a form of attention. Compare the formula for such a network with the formula for a simple dot-product attention layer (biases and scaling factors omitted):
\begin{align*}
FFN(x, W_1, W_2) = ReLU(xW_1)W_2 \\
A(q, K, V) = Softmax(qK^T)V
\end{align*}
Based on the similarity of these formulae, the two-layer feed-forward network can be seen as a kind of attention, where the keys and values are the rows of the trainable parameter matrices $W_1$ and $W_2$, and where we use ReLU instead of Softmax in the compatibility function.
%the compatablity function is $compat(q, k_i) = ReLU(q \cdot k_i)$ instead of $Softmax(qK_T)_i$.
Given this similarity, we experimented with replacing the position-wise feed-forward networks with attention layers similar to the ones we use everywhere else our model. The multi-head-attention-over-parameters sublayer is identical to the multi-head attention described in \ref{sec:multihead}, except that the "keys" and "values" inputs to each attention head are trainable model parameters, as opposed to being linear projections of a previous layer. These parameters are scaled up by a factor of $\sqrt{d_{model}}$ in order to be more similar to activations.
In our first experiment, we replaced each position-wise feed-forward network with a multi-head-attention-over-parameters sublayer with $h_p=8$ heads, key-dimensionality $d_{pk}=64$, and value-dimensionality $d_{pv}=64$, using $n_p=1536$ key-value pairs for each attention head. The sublayer has a total of $2097152$ parameters, including the parameters in the query projection and the output projection. This matches the number of parameters in the position-wise feed-forward network that we replaced. While the theoretical amount of computation is also the same, in practice, the attention version caused the step times to be about 30\% longer.
In our second experiment, we used $h_p=8$ heads, and $n_p=512$ key-value pairs for each attention head, again matching the total number of parameters in the base model.
Results for the first experiment were slightly worse than for the base model, and results for the second experiment were slightly better, see Table~\ref{tab:parameter_attention}.
\begin{table}[h]
\caption{Replacing the position-wise feed-forward networks with multihead-attention-over-parameters produces similar results to the base model. All metrics are on the English-to-German translation development set, newstest2013.}
\label{tab:parameter_attention}
\begin{center}
\vspace{-2mm}
%\scalebox{1.0}{
\begin{tabular}{c|cccccc|cccc}
\hline\rule{0pt}{2.0ex}
& \multirow{2}{*}{$\dmodel$} & \multirow{2}{*}{$\dff$} &
\multirow{2}{*}{$h_p$} & \multirow{2}{*}{$d_{pk}$} & \multirow{2}{*}{$d_{pv}$} &
\multirow{2}{*}{$n_p$} &
PPL & BLEU & params & training\\
& & & & & & & (dev) & (dev) & $\times10^6$ & time \\
\hline\rule{0pt}{2.0ex}
base & 512 & 2048 & & & & & 4.92 & 25.8 & 65 & 12 hours\\
\hline\rule{0pt}{2.0ex}
AOP$_1$ & 512 & & 8 & 64 & 64 & 1536 & 4.92& 25.5 & 65 & 16 hours\\
AOP$_2$ & 512 & & 16 & 64 & 64 & 512 & \textbf{4.86} & \textbf{25.9} & 65 & 16 hours \\
\hline
\end{tabular}
%}
\end{center}
\end{table}

查看文件

@@ -1,8 +0,0 @@
chatgpt的老祖宗《Attention is all you need》
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
真实的摘要如下
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
https://arxiv.org/abs/1706.03762

查看文件

@@ -1,2 +0,0 @@
from stable_baselines3.dqn.dqn import DQN
from stable_baselines3.dqn.policies import CnnPolicy, MlpPolicy

查看文件

@@ -1,245 +0,0 @@
from typing import Any, Dict, List, Optional, Tuple, Type, Union
import gym
import numpy as np
import torch as th
from torch.nn import functional as F
from stable_baselines3.common import logger
from stable_baselines3.common.off_policy_algorithm import OffPolicyAlgorithm
from stable_baselines3.common.preprocessing import maybe_transpose
from stable_baselines3.common.type_aliases import GymEnv, MaybeCallback, Schedule
from stable_baselines3.common.utils import get_linear_fn, is_vectorized_observation, polyak_update
from stable_baselines3.dqn.policies import DQNPolicy
class DQN(OffPolicyAlgorithm):
"""
Deep Q-Network (DQN)
Paper: https://arxiv.org/abs/1312.5602, https://www.nature.com/articles/nature14236
Default hyperparameters are taken from the nature paper,
except for the optimizer and learning rate that were taken from Stable Baselines defaults.
:param policy: The policy model to use (MlpPolicy, CnnPolicy, ...)
:param env: The environment to learn from (if registered in Gym, can be str)
:param learning_rate: The learning rate, it can be a function
of the current progress remaining (from 1 to 0)
:param buffer_size: size of the replay buffer
:param learning_starts: how many steps of the model to collect transitions for before learning starts
:param batch_size: Minibatch size for each gradient update
:param tau: the soft update coefficient ("Polyak update", between 0 and 1) default 1 for hard update
:param gamma: the discount factor
:param train_freq: Update the model every ``train_freq`` steps. Alternatively pass a tuple of frequency and unit
like ``(5, "step")`` or ``(2, "episode")``.
:param gradient_steps: How many gradient steps to do after each rollout (see ``train_freq``)
Set to ``-1`` means to do as many gradient steps as steps done in the environment
during the rollout.
:param optimize_memory_usage: Enable a memory efficient variant of the replay buffer
at a cost of more complexity.
See https://github.com/DLR-RM/stable-baselines3/issues/37#issuecomment-637501195
:param target_update_interval: update the target network every ``target_update_interval``
environment steps.
:param exploration_fraction: fraction of entire training period over which the exploration rate is reduced
:param exploration_initial_eps: initial value of random action probability
:param exploration_final_eps: final value of random action probability
:param max_grad_norm: The maximum value for the gradient clipping
:param tensorboard_log: the log location for tensorboard (if None, no logging)
:param create_eval_env: Whether to create a second environment that will be
used for evaluating the agent periodically. (Only available when passing string for the environment)
:param policy_kwargs: additional arguments to be passed to the policy on creation
:param verbose: the verbosity level: 0 no output, 1 info, 2 debug
:param seed: Seed for the pseudo random generators
:param device: Device (cpu, cuda, ...) on which the code should be run.
Setting it to auto, the code will be run on the GPU if possible.
:param _init_setup_model: Whether or not to build the network at the creation of the instance
"""
def __init__(
self,
policy: Union[str, Type[DQNPolicy]],
env: Union[GymEnv, str],
learning_rate: Union[float, Schedule] = 1e-4,
buffer_size: int = 1000000,
learning_starts: int = 50000,
batch_size: Optional[int] = 32,
tau: float = 1.0,
gamma: float = 0.99,
train_freq: Union[int, Tuple[int, str]] = 4,
gradient_steps: int = 1,
optimize_memory_usage: bool = False,
target_update_interval: int = 10000,
exploration_fraction: float = 0.1,
exploration_initial_eps: float = 1.0,
exploration_final_eps: float = 0.05,
max_grad_norm: float = 10,
tensorboard_log: Optional[str] = None,
create_eval_env: bool = False,
policy_kwargs: Optional[Dict[str, Any]] = None,
verbose: int = 0,
seed: Optional[int] = None,
device: Union[th.device, str] = "auto",
_init_setup_model: bool = True,
):
super(DQN, self).__init__(
policy,
env,
DQNPolicy,
learning_rate,
buffer_size,
learning_starts,
batch_size,
tau,
gamma,
train_freq,
gradient_steps,
action_noise=None, # No action noise
policy_kwargs=policy_kwargs,
tensorboard_log=tensorboard_log,
verbose=verbose,
device=device,
create_eval_env=create_eval_env,
seed=seed,
sde_support=False,
optimize_memory_usage=optimize_memory_usage,
supported_action_spaces=(gym.spaces.Discrete,),
)
self.exploration_initial_eps = exploration_initial_eps
self.exploration_final_eps = exploration_final_eps
self.exploration_fraction = exploration_fraction
self.target_update_interval = target_update_interval
self.max_grad_norm = max_grad_norm
# "epsilon" for the epsilon-greedy exploration
self.exploration_rate = 0.0
# Linear schedule will be defined in `_setup_model()`
self.exploration_schedule = None
self.q_net, self.q_net_target = None, None
if _init_setup_model:
self._setup_model()
def _setup_model(self) -> None:
super(DQN, self)._setup_model()
self._create_aliases()
self.exploration_schedule = get_linear_fn(
self.exploration_initial_eps, self.exploration_final_eps, self.exploration_fraction
)
def _create_aliases(self) -> None:
self.q_net = self.policy.q_net
self.q_net_target = self.policy.q_net_target
def _on_step(self) -> None:
"""
Update the exploration rate and target network if needed.
This method is called in ``collect_rollouts()`` after each step in the environment.
"""
if self.num_timesteps % self.target_update_interval == 0:
polyak_update(self.q_net.parameters(), self.q_net_target.parameters(), self.tau)
self.exploration_rate = self.exploration_schedule(self._current_progress_remaining)
logger.record("rollout/exploration rate", self.exploration_rate)
def train(self, gradient_steps: int, batch_size: int = 100) -> None:
# Update learning rate according to schedule
self._update_learning_rate(self.policy.optimizer)
losses = []
for _ in range(gradient_steps):
# Sample replay buffer
replay_data = self.replay_buffer.sample(batch_size, env=self._vec_normalize_env)
with th.no_grad():
# Compute the next Q-values using the target network
next_q_values = self.q_net_target(replay_data.next_observations)
# Follow greedy policy: use the one with the highest value
next_q_values, _ = next_q_values.max(dim=1)
# Avoid potential broadcast issue
next_q_values = next_q_values.reshape(-1, 1)
# 1-step TD target
target_q_values = replay_data.rewards + (1 - replay_data.dones) * self.gamma * next_q_values
# Get current Q-values estimates
current_q_values = self.q_net(replay_data.observations)
# Retrieve the q-values for the actions from the replay buffer
current_q_values = th.gather(current_q_values, dim=1, index=replay_data.actions.long())
# Compute Huber loss (less sensitive to outliers)
loss = F.smooth_l1_loss(current_q_values, target_q_values)
losses.append(loss.item())
# Optimize the policy
self.policy.optimizer.zero_grad()
loss.backward()
# Clip gradient norm
th.nn.utils.clip_grad_norm_(self.policy.parameters(), self.max_grad_norm)
self.policy.optimizer.step()
# Increase update counter
self._n_updates += gradient_steps
logger.record("train/n_updates", self._n_updates, exclude="tensorboard")
logger.record("train/loss", np.mean(losses))
def predict(
self,
observation: np.ndarray,
state: Optional[np.ndarray] = None,
mask: Optional[np.ndarray] = None,
deterministic: bool = False,
) -> Tuple[np.ndarray, Optional[np.ndarray]]:
"""
Overrides the base_class predict function to include epsilon-greedy exploration.
:param observation: the input observation
:param state: The last states (can be None, used in recurrent policies)
:param mask: The last masks (can be None, used in recurrent policies)
:param deterministic: Whether or not to return deterministic actions.
:return: the model's action and the next state
(used in recurrent policies)
"""
if not deterministic and np.random.rand() < self.exploration_rate:
if is_vectorized_observation(maybe_transpose(observation, self.observation_space), self.observation_space):
n_batch = observation.shape[0]
action = np.array([self.action_space.sample() for _ in range(n_batch)])
else:
action = np.array(self.action_space.sample())
else:
action, state = self.policy.predict(observation, state, mask, deterministic)
return action, state
def learn(
self,
total_timesteps: int,
callback: MaybeCallback = None,
log_interval: int = 4,
eval_env: Optional[GymEnv] = None,
eval_freq: int = -1,
n_eval_episodes: int = 5,
tb_log_name: str = "DQN",
eval_log_path: Optional[str] = None,
reset_num_timesteps: bool = True,
) -> OffPolicyAlgorithm:
return super(DQN, self).learn(
total_timesteps=total_timesteps,
callback=callback,
log_interval=log_interval,
eval_env=eval_env,
eval_freq=eval_freq,
n_eval_episodes=n_eval_episodes,
tb_log_name=tb_log_name,
eval_log_path=eval_log_path,
reset_num_timesteps=reset_num_timesteps,
)
def _excluded_save_params(self) -> List[str]:
return super(DQN, self)._excluded_save_params() + ["q_net", "q_net_target"]
def _get_torch_save_params(self) -> Tuple[List[str], List[str]]:
state_dicts = ["policy", "policy.optimizer"]
return state_dicts, []

查看文件

@@ -1,237 +0,0 @@
from typing import Any, Dict, List, Optional, Type
import gym
import torch as th
from torch import nn
from stable_baselines3.common.policies import BasePolicy, register_policy
from stable_baselines3.common.torch_layers import BaseFeaturesExtractor, FlattenExtractor, NatureCNN, create_mlp
from stable_baselines3.common.type_aliases import Schedule
class QNetwork(BasePolicy):
"""
Action-Value (Q-Value) network for DQN
:param observation_space: Observation space
:param action_space: Action space
:param net_arch: The specification of the policy and value networks.
:param activation_fn: Activation function
:param normalize_images: Whether to normalize images or not,
dividing by 255.0 (True by default)
"""
def __init__(
self,
observation_space: gym.spaces.Space,
action_space: gym.spaces.Space,
features_extractor: nn.Module,
features_dim: int,
net_arch: Optional[List[int]] = None,
activation_fn: Type[nn.Module] = nn.ReLU,
normalize_images: bool = True,
):
super(QNetwork, self).__init__(
observation_space,
action_space,
features_extractor=features_extractor,
normalize_images=normalize_images,
)
if net_arch is None:
net_arch = [64, 64]
self.net_arch = net_arch
self.activation_fn = activation_fn
self.features_extractor = features_extractor
self.features_dim = features_dim
self.normalize_images = normalize_images
action_dim = self.action_space.n # number of actions
q_net = create_mlp(self.features_dim, action_dim, self.net_arch, self.activation_fn)
self.q_net = nn.Sequential(*q_net)
def forward(self, obs: th.Tensor) -> th.Tensor:
"""
Predict the q-values.
:param obs: Observation
:return: The estimated Q-Value for each action.
"""
return self.q_net(self.extract_features(obs))
def _predict(self, observation: th.Tensor, deterministic: bool = True) -> th.Tensor:
q_values = self.forward(observation)
# Greedy action
action = q_values.argmax(dim=1).reshape(-1)
return action
def _get_constructor_parameters(self) -> Dict[str, Any]:
data = super()._get_constructor_parameters()
data.update(
dict(
net_arch=self.net_arch,
features_dim=self.features_dim,
activation_fn=self.activation_fn,
features_extractor=self.features_extractor,
)
)
return data
class DQNPolicy(BasePolicy):
"""
Policy class with Q-Value Net and target net for DQN
:param observation_space: Observation space
:param action_space: Action space
:param lr_schedule: Learning rate schedule (could be constant)
:param net_arch: The specification of the policy and value networks.
:param activation_fn: Activation function
:param features_extractor_class: Features extractor to use.
:param features_extractor_kwargs: Keyword arguments
to pass to the features extractor.
:param normalize_images: Whether to normalize images or not,
dividing by 255.0 (True by default)
:param optimizer_class: The optimizer to use,
``th.optim.Adam`` by default
:param optimizer_kwargs: Additional keyword arguments,
excluding the learning rate, to pass to the optimizer
"""
def __init__(
self,
observation_space: gym.spaces.Space,
action_space: gym.spaces.Space,
lr_schedule: Schedule,
net_arch: Optional[List[int]] = None,
activation_fn: Type[nn.Module] = nn.ReLU,
features_extractor_class: Type[BaseFeaturesExtractor] = FlattenExtractor,
features_extractor_kwargs: Optional[Dict[str, Any]] = None,
normalize_images: bool = True,
optimizer_class: Type[th.optim.Optimizer] = th.optim.Adam,
optimizer_kwargs: Optional[Dict[str, Any]] = None,
):
super(DQNPolicy, self).__init__(
observation_space,
action_space,
features_extractor_class,
features_extractor_kwargs,
optimizer_class=optimizer_class,
optimizer_kwargs=optimizer_kwargs,
)
if net_arch is None:
if features_extractor_class == FlattenExtractor:
net_arch = [64, 64]
else:
net_arch = []
self.net_arch = net_arch
self.activation_fn = activation_fn
self.normalize_images = normalize_images
self.net_args = {
"observation_space": self.observation_space,
"action_space": self.action_space,
"net_arch": self.net_arch,
"activation_fn": self.activation_fn,
"normalize_images": normalize_images,
}
self.q_net, self.q_net_target = None, None
self._build(lr_schedule)
def _build(self, lr_schedule: Schedule) -> None:
"""
Create the network and the optimizer.
:param lr_schedule: Learning rate schedule
lr_schedule(1) is the initial learning rate
"""
self.q_net = self.make_q_net()
self.q_net_target = self.make_q_net()
self.q_net_target.load_state_dict(self.q_net.state_dict())
# Setup optimizer with initial learning rate
self.optimizer = self.optimizer_class(self.parameters(), lr=lr_schedule(1), **self.optimizer_kwargs)
def make_q_net(self) -> QNetwork:
# Make sure we always have separate networks for features extractors etc
net_args = self._update_features_extractor(self.net_args, features_extractor=None)
return QNetwork(**net_args).to(self.device)
def forward(self, obs: th.Tensor, deterministic: bool = True) -> th.Tensor:
return self._predict(obs, deterministic=deterministic)
def _predict(self, obs: th.Tensor, deterministic: bool = True) -> th.Tensor:
return self.q_net._predict(obs, deterministic=deterministic)
def _get_constructor_parameters(self) -> Dict[str, Any]:
data = super()._get_constructor_parameters()
data.update(
dict(
net_arch=self.net_args["net_arch"],
activation_fn=self.net_args["activation_fn"],
lr_schedule=self._dummy_schedule, # dummy lr schedule, not needed for loading policy alone
optimizer_class=self.optimizer_class,
optimizer_kwargs=self.optimizer_kwargs,
features_extractor_class=self.features_extractor_class,
features_extractor_kwargs=self.features_extractor_kwargs,
)
)
return data
MlpPolicy = DQNPolicy
class CnnPolicy(DQNPolicy):
"""
Policy class for DQN when using images as input.
:param observation_space: Observation space
:param action_space: Action space
:param lr_schedule: Learning rate schedule (could be constant)
:param net_arch: The specification of the policy and value networks.
:param activation_fn: Activation function
:param features_extractor_class: Features extractor to use.
:param normalize_images: Whether to normalize images or not,
dividing by 255.0 (True by default)
:param optimizer_class: The optimizer to use,
``th.optim.Adam`` by default
:param optimizer_kwargs: Additional keyword arguments,
excluding the learning rate, to pass to the optimizer
"""
def __init__(
self,
observation_space: gym.spaces.Space,
action_space: gym.spaces.Space,
lr_schedule: Schedule,
net_arch: Optional[List[int]] = None,
activation_fn: Type[nn.Module] = nn.ReLU,
features_extractor_class: Type[BaseFeaturesExtractor] = NatureCNN,
features_extractor_kwargs: Optional[Dict[str, Any]] = None,
normalize_images: bool = True,
optimizer_class: Type[th.optim.Optimizer] = th.optim.Adam,
optimizer_kwargs: Optional[Dict[str, Any]] = None,
):
super(CnnPolicy, self).__init__(
observation_space,
action_space,
lr_schedule,
net_arch,
activation_fn,
features_extractor_class,
features_extractor_kwargs,
normalize_images,
optimizer_class,
optimizer_kwargs,
)
register_policy("MlpPolicy", MlpPolicy)
register_policy("CnnPolicy", CnnPolicy)

查看文件

@@ -1,2 +0,0 @@
github stablebaseline3
https://github.com/DLR-RM/stable-baselines3

查看文件

@@ -1,27 +0,0 @@
"In practice, we found that a high-entropy initial state is more likely to increase the speed of training.
The entropy is calculated by:
$$H=-\sum_{k= 1}^{n_k} p(k) \cdot \log p(k), p(k)=\frac{|A_k|}{|\mathcal{A}|}$$
where $H$ is the entropy, $|A_k|$ is the number of agent nodes in $k$-th cluster, $|\mathcal{A}|$ is the total number of agents.
To ensure the Cooperation Graph initialization has higher entropy,
we will randomly generate multiple initial states,
rank by their entropy and then pick the one with maximum $H$."
```
FROM ubuntu:latest
RUN apt-get update && \
apt-get install -y python3 python3-pip && \
rm -rf /var/lib/apt/lists/*
RUN echo '[global]' > /etc/pip.conf && \
echo 'index-url = https://mirrors.aliyun.com/pypi/simple/' >> /etc/pip.conf && \
echo 'trusted-host = mirrors.aliyun.com' >> /etc/pip.conf
RUN pip3 install gradio requests[socks] mdtex2html
COPY . /gpt
WORKDIR /gpt
CMD ["python3", "main.py"]
```

查看文件

@@ -0,0 +1,114 @@
from pydantic import BaseModel, Field
from typing import List
from toolbox import update_ui_lastest_msg, disable_auto_promotion
from request_llm.bridge_all import predict_no_ui_long_connection
from crazy_functions.json_fns.pydantic_io import GptJsonIO, JsonStringError
import copy, json, pickle, os, sys, time
def read_avail_plugin_enum():
from crazy_functional import get_crazy_functions
plugin_arr = get_crazy_functions()
# remove plugins with out explaination
plugin_arr = {k:v for k, v in plugin_arr.items() if 'Info' in v}
plugin_arr_info = {"F_{:04d}".format(i):v["Info"] for i, v in enumerate(plugin_arr.values(), start=1)}
plugin_arr_dict = {"F_{:04d}".format(i):v for i, v in enumerate(plugin_arr.values(), start=1)}
plugin_arr_dict_parse = {"F_{:04d}".format(i):v for i, v in enumerate(plugin_arr.values(), start=1)}
plugin_arr_dict_parse.update({f"F_{i}":v for i, v in enumerate(plugin_arr.values(), start=1)})
prompt = json.dumps(plugin_arr_info, ensure_ascii=False, indent=2)
prompt = "\n\nThe defination of PluginEnum:\nPluginEnum=" + prompt
return prompt, plugin_arr_dict, plugin_arr_dict_parse
def wrap_code(txt):
txt = txt.replace('```','')
return f"\n```\n{txt}\n```\n"
def have_any_recent_upload_files(chatbot):
_5min = 5 * 60
if not chatbot: return False # chatbot is None
most_recent_uploaded = chatbot._cookies.get("most_recent_uploaded", None)
if not most_recent_uploaded: return False # most_recent_uploaded is None
if time.time() - most_recent_uploaded["time"] < _5min: return True # most_recent_uploaded is new
else: return False # most_recent_uploaded is too old
def get_recent_file_prompt_support(chatbot):
most_recent_uploaded = chatbot._cookies.get("most_recent_uploaded", None)
path = most_recent_uploaded['path']
prompt = "\nAdditional Information:\n"
prompt = "In case that this plugin requires a path or a file as argument,"
prompt += f"it is important for you to know that the user has recently uploaded a file, located at: `{path}`"
prompt += f"Only use it when necessary, otherwise, you can ignore this file."
return prompt
def get_inputs_show_user(inputs, plugin_arr_enum_prompt):
# remove plugin_arr_enum_prompt from inputs string
inputs_show_user = inputs.replace(plugin_arr_enum_prompt, "")
inputs_show_user += plugin_arr_enum_prompt[:200] + '...'
inputs_show_user += '\n...\n'
inputs_show_user += '...\n'
inputs_show_user += '...}'
return inputs_show_user
def execute_plugin(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_intention):
plugin_arr_enum_prompt, plugin_arr_dict, plugin_arr_dict_parse = read_avail_plugin_enum()
class Plugin(BaseModel):
plugin_selection: str = Field(description="The most related plugin from one of the PluginEnum.", default="F_0000")
reason_of_selection: str = Field(description="The reason why you should select this plugin.", default="This plugin satisfy user requirement most")
# ⭐ ⭐ ⭐ 选择插件
yield from update_ui_lastest_msg(lastmsg=f"正在执行任务: {txt}\n\n查找可用插件中...", chatbot=chatbot, history=history, delay=0)
gpt_json_io = GptJsonIO(Plugin)
gpt_json_io.format_instructions = "The format of your output should be a json that can be parsed by json.loads.\n"
gpt_json_io.format_instructions += """Output example: {"plugin_selection":"F_1234", "reason_of_selection":"F_1234 plugin satisfy user requirement most"}\n"""
gpt_json_io.format_instructions += "The plugins you are authorized to use are listed below:\n"
gpt_json_io.format_instructions += plugin_arr_enum_prompt
inputs = "Choose the correct plugin according to user requirements, the user requirement is: \n\n" + \
">> " + txt.rstrip('\n').replace('\n','\n>> ') + '\n\n' + gpt_json_io.format_instructions
run_gpt_fn = lambda inputs, sys_prompt: predict_no_ui_long_connection(
inputs=inputs, llm_kwargs=llm_kwargs, history=[], sys_prompt=sys_prompt, observe_window=[])
try:
gpt_reply = run_gpt_fn(inputs, "")
plugin_sel = gpt_json_io.generate_output_auto_repair(gpt_reply, run_gpt_fn)
except JsonStringError:
msg = f"抱歉, {llm_kwargs['llm_model']}无法理解您的需求。"
msg += "请求的Prompt为\n" + wrap_code(get_inputs_show_user(inputs, plugin_arr_enum_prompt))
msg += "语言模型回复为:\n" + wrap_code(gpt_reply)
msg += "\n但您可以尝试再试一次\n"
yield from update_ui_lastest_msg(lastmsg=msg, chatbot=chatbot, history=history, delay=2)
return
if plugin_sel.plugin_selection not in plugin_arr_dict_parse:
msg = f"抱歉, 找不到合适插件执行该任务, 或者{llm_kwargs['llm_model']}无法理解您的需求。"
msg += f"语言模型{llm_kwargs['llm_model']}选择了不存在的插件:\n" + wrap_code(gpt_reply)
msg += "\n但您可以尝试再试一次\n"
yield from update_ui_lastest_msg(lastmsg=msg, chatbot=chatbot, history=history, delay=2)
return
# ⭐ ⭐ ⭐ 确认插件参数
if not have_any_recent_upload_files(chatbot):
appendix_info = ""
else:
appendix_info = get_recent_file_prompt_support(chatbot)
plugin = plugin_arr_dict_parse[plugin_sel.plugin_selection]
yield from update_ui_lastest_msg(lastmsg=f"正在执行任务: {txt}\n\n提取插件参数...", chatbot=chatbot, history=history, delay=0)
class PluginExplicit(BaseModel):
plugin_selection: str = plugin_sel.plugin_selection
plugin_arg: str = Field(description="The argument of the plugin.", default="")
gpt_json_io = GptJsonIO(PluginExplicit)
gpt_json_io.format_instructions += "The information about this plugin is:" + plugin["Info"]
inputs = f"A plugin named {plugin_sel.plugin_selection} is selected, " + \
"you should extract plugin_arg from the user requirement, the user requirement is: \n\n" + \
">> " + (txt + appendix_info).rstrip('\n').replace('\n','\n>> ') + '\n\n' + \
gpt_json_io.format_instructions
run_gpt_fn = lambda inputs, sys_prompt: predict_no_ui_long_connection(
inputs=inputs, llm_kwargs=llm_kwargs, history=[], sys_prompt=sys_prompt, observe_window=[])
plugin_sel = gpt_json_io.generate_output_auto_repair(run_gpt_fn(inputs, ""), run_gpt_fn)
# ⭐ ⭐ ⭐ 执行插件
fn = plugin['Function']
fn_name = fn.__name__
msg = f'{llm_kwargs["llm_model"]}为您选择了插件: `{fn_name}`\n\n插件说明:{plugin["Info"]}\n\n插件参数:{plugin_sel.plugin_arg}\n\n假如偏离了您的要求,按停止键终止。'
yield from update_ui_lastest_msg(lastmsg=msg, chatbot=chatbot, history=history, delay=2)
yield from fn(plugin_sel.plugin_arg, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, -1)
return

查看文件

@@ -0,0 +1,81 @@
from pydantic import BaseModel, Field
from typing import List
from toolbox import update_ui_lastest_msg, get_conf
from request_llm.bridge_all import predict_no_ui_long_connection
from crazy_functions.json_fns.pydantic_io import GptJsonIO
import copy, json, pickle, os, sys
def modify_configuration_hot(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_intention):
ALLOW_RESET_CONFIG, = get_conf('ALLOW_RESET_CONFIG')
if not ALLOW_RESET_CONFIG:
yield from update_ui_lastest_msg(
lastmsg=f"当前配置不允许被修改如需激活本功能,请在config.py中设置ALLOW_RESET_CONFIG=True后重启软件。",
chatbot=chatbot, history=history, delay=2
)
return
# ⭐ ⭐ ⭐ 读取可配置项目条目
names = {}
from enum import Enum
import config
for k, v in config.__dict__.items():
if k.startswith('__'): continue
names.update({k:k})
# if len(names) > 20: break # 限制最多前10个配置项,如果太多了会导致gpt无法理解
ConfigOptions = Enum('ConfigOptions', names)
class ModifyConfigurationIntention(BaseModel):
which_config_to_modify: ConfigOptions = Field(description="the name of the configuration to modify, you must choose from one of the ConfigOptions enum.", default=None)
new_option_value: str = Field(description="the new value of the option", default=None)
# ⭐ ⭐ ⭐ 分析用户意图
yield from update_ui_lastest_msg(lastmsg=f"正在执行任务: {txt}\n\n读取新配置中", chatbot=chatbot, history=history, delay=0)
gpt_json_io = GptJsonIO(ModifyConfigurationIntention)
inputs = "Analyze how to change configuration according to following user input, answer me with json: \n\n" + \
">> " + txt.rstrip('\n').replace('\n','\n>> ') + '\n\n' + \
gpt_json_io.format_instructions
run_gpt_fn = lambda inputs, sys_prompt: predict_no_ui_long_connection(
inputs=inputs, llm_kwargs=llm_kwargs, history=[], sys_prompt=sys_prompt, observe_window=[])
user_intention = gpt_json_io.generate_output_auto_repair(run_gpt_fn(inputs, ""), run_gpt_fn)
explicit_conf = user_intention.which_config_to_modify.value
ok = (explicit_conf in txt)
if ok:
yield from update_ui_lastest_msg(
lastmsg=f"正在执行任务: {txt}\n\n新配置{explicit_conf}={user_intention.new_option_value}",
chatbot=chatbot, history=history, delay=1
)
yield from update_ui_lastest_msg(
lastmsg=f"正在执行任务: {txt}\n\n新配置{explicit_conf}={user_intention.new_option_value}\n\n正在修改配置中",
chatbot=chatbot, history=history, delay=2
)
# ⭐ ⭐ ⭐ 立即应用配置
from toolbox import set_conf
set_conf(explicit_conf, user_intention.new_option_value)
yield from update_ui_lastest_msg(
lastmsg=f"正在执行任务: {txt}\n\n配置修改完成,重新页面即可生效。", chatbot=chatbot, history=history, delay=1
)
else:
yield from update_ui_lastest_msg(
lastmsg=f"失败,如果需要配置{explicit_conf},您需要明确说明并在指令中提到它。", chatbot=chatbot, history=history, delay=5
)
def modify_configuration_reboot(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_intention):
ALLOW_RESET_CONFIG, = get_conf('ALLOW_RESET_CONFIG')
if not ALLOW_RESET_CONFIG:
yield from update_ui_lastest_msg(
lastmsg=f"当前配置不允许被修改如需激活本功能,请在config.py中设置ALLOW_RESET_CONFIG=True后重启软件。",
chatbot=chatbot, history=history, delay=2
)
return
yield from modify_configuration_hot(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_intention)
yield from update_ui_lastest_msg(
lastmsg=f"正在执行任务: {txt}\n\n配置修改完成,五秒后即将重启!若出现报错请无视即可。", chatbot=chatbot, history=history, delay=5
)
os.execl(sys.executable, sys.executable, *sys.argv)

查看文件

@@ -0,0 +1,28 @@
import pickle
class VoidTerminalState():
def __init__(self):
self.reset_state()
def reset_state(self):
self.has_provided_explaination = False
def lock_plugin(self, chatbot):
chatbot._cookies['lock_plugin'] = 'crazy_functions.虚空终端->虚空终端'
chatbot._cookies['plugin_state'] = pickle.dumps(self)
def unlock_plugin(self, chatbot):
self.reset_state()
chatbot._cookies['lock_plugin'] = None
chatbot._cookies['plugin_state'] = pickle.dumps(self)
def set_state(self, chatbot, key, value):
setattr(self, key, value)
chatbot._cookies['plugin_state'] = pickle.dumps(self)
def get_state(chatbot):
state = chatbot._cookies.get('plugin_state', None)
if state is not None: state = pickle.loads(state)
else: state = VoidTerminalState()
state.chatbot = chatbot
return state

查看文件

@@ -1,5 +1,6 @@
from toolbox import update_ui
from toolbox import CatchException, report_execption, write_results_to_file, get_conf
from toolbox import update_ui, get_log_folder
from toolbox import write_history_to_file, promote_file_to_downloadzone
from toolbox import CatchException, report_execption, get_conf
import re, requests, unicodedata, os
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
def download_arxiv_(url_pdf):
@@ -28,7 +29,7 @@ def download_arxiv_(url_pdf):
if k in other_info['comment']:
title = k + ' ' + title
download_dir = './gpt_log/arxiv/'
download_dir = get_log_folder(plugin_name='arxiv')
os.makedirs(download_dir, exist_ok=True)
title_str = title.replace('?', '')\
@@ -40,9 +41,6 @@ def download_arxiv_(url_pdf):
requests_pdf_url = url_pdf
file_path = download_dir+title_str
# if os.path.exists(file_path):
# print('返回缓存文件')
# return './gpt_log/arxiv/'+title_str
print('下载中')
proxies, = get_conf('proxies')
@@ -61,7 +59,7 @@ def download_arxiv_(url_pdf):
.replace('\n', '')\
.replace(' ', ' ')\
.replace(' ', ' ')
return './gpt_log/arxiv/'+title_str, other_info
return file_path, other_info
def get_name(_url_):
@@ -144,11 +142,11 @@ def 下载arxiv论文并翻译摘要(txt, llm_kwargs, plugin_kwargs, chatbot, hi
# 尝试导入依赖,如果缺少依赖,则给出安装建议
try:
import pdfminer, bs4
import bs4
except:
report_execption(chatbot, history,
a = f"解析项目: {txt}",
b = f"导入软件依赖失败。使用该模块需要额外依赖,安装方法```pip install --upgrade pdfminer beautifulsoup4```。")
b = f"导入软件依赖失败。使用该模块需要额外依赖,安装方法```pip install --upgrade beautifulsoup4```。")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
@@ -184,11 +182,10 @@ def 下载arxiv论文并翻译摘要(txt, llm_kwargs, plugin_kwargs, chatbot, hi
chatbot[-1] = (i_say_show_user, gpt_say)
history.append(i_say_show_user); history.append(gpt_say)
yield from update_ui(chatbot=chatbot, history=history, msg=msg) # 刷新界面
# 写入文件
import shutil
# 重置文件的创建时间
shutil.copyfile(pdf_path, f'./gpt_log/{os.path.basename(pdf_path)}'); os.remove(pdf_path)
res = write_results_to_file(history)
res = write_history_to_file(history)
promote_file_to_downloadzone(res, chatbot=chatbot)
promote_file_to_downloadzone(pdf_path, chatbot=chatbot)
chatbot.append(("完成了吗?", res + "\n\nPDF文件也已经下载"))
yield from update_ui(chatbot=chatbot, history=history, msg=msg) # 刷新界面

查看文件

@@ -0,0 +1,63 @@
from toolbox import CatchException, update_ui
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
@CatchException
def 交互功能模板函数(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
"""
txt 输入栏用户输入的文本,例如需要翻译的一段话,再例如一个包含了待处理文件的路径
llm_kwargs gpt模型参数, 如温度和top_p等, 一般原样传递下去就行
plugin_kwargs 插件模型的参数, 如温度和top_p等, 一般原样传递下去就行
chatbot 聊天显示框的句柄,用于显示给用户
history 聊天历史,前情提要
system_prompt 给gpt的静默提醒
web_port 当前软件运行的端口号
"""
history = [] # 清空历史,以免输入溢出
chatbot.append(("这是什么功能?", "交互功能函数模板。在执行完成之后, 可以将自身的状态存储到cookie中, 等待用户的再次调用。"))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
state = chatbot._cookies.get('plugin_state_0001', None) # 初始化插件状态
if state is None:
chatbot._cookies['lock_plugin'] = 'crazy_functions.交互功能函数模板->交互功能模板函数' # 赋予插件锁定 锁定插件回调路径,当下一次用户提交时,会直接转到该函数
chatbot._cookies['plugin_state_0001'] = 'wait_user_keyword' # 赋予插件状态
chatbot.append(("第一次调用:", "请输入关键词, 我将为您查找相关壁纸, 建议使用英文单词, 插件锁定中,请直接提交即可。"))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
if state == 'wait_user_keyword':
chatbot._cookies['lock_plugin'] = None # 解除插件锁定,避免遗忘导致死锁
chatbot._cookies['plugin_state_0001'] = None # 解除插件状态,避免遗忘导致死锁
# 解除插件锁定
chatbot.append((f"获取关键词:{txt}", ""))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
page_return = get_image_page_by_keyword(txt)
inputs=inputs_show_user=f"Extract all image urls in this html page, pick the first 5 images and show them with markdown format: \n\n {page_return}"
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
inputs=inputs, inputs_show_user=inputs_show_user,
llm_kwargs=llm_kwargs, chatbot=chatbot, history=[],
sys_prompt="When you want to show an image, use markdown format. e.g. ![image_description](image_url). If there are no image url provided, answer 'no image url provided'"
)
chatbot[-1] = [chatbot[-1][0], gpt_say]
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# ---------------------------------------------------------------------------------
def get_image_page_by_keyword(keyword):
import requests
from bs4 import BeautifulSoup
response = requests.get(f'https://wallhaven.cc/search?q={keyword}', timeout=2)
res = "image urls: \n"
for image_element in BeautifulSoup(response.content, 'html.parser').findAll("img"):
try:
res += image_element["data-src"]
res += "\n"
except:
pass
return res

查看文件

@@ -1,138 +0,0 @@
import threading
from request_llm.bridge_all import predict_no_ui_long_connection
from toolbox import update_ui
from toolbox import CatchException, write_results_to_file, report_execption
from .crazy_utils import breakdown_txt_to_satisfy_token_limit
def extract_code_block_carefully(txt):
splitted = txt.split('```')
n_code_block_seg = len(splitted) - 1
if n_code_block_seg <= 1: return txt
# 剩下的情况都开头除去 ``` 结尾除去一次 ```
txt_out = '```'.join(splitted[1:-1])
return txt_out
def break_txt_into_half_at_some_linebreak(txt):
lines = txt.split('\n')
n_lines = len(lines)
pre = lines[:(n_lines//2)]
post = lines[(n_lines//2):]
return "\n".join(pre), "\n".join(post)
@CatchException
def 全项目切换英文(txt, llm_kwargs, plugin_kwargs, chatbot, history, sys_prompt, web_port):
# 第1步清空历史,以免输入溢出
history = []
# 第2步尝试导入依赖,如果缺少依赖,则给出安装建议
try:
import tiktoken
except:
report_execption(chatbot, history,
a = f"解析项目: {txt}",
b = f"导入软件依赖失败。使用该模块需要额外依赖,安装方法```pip install --upgrade tiktoken```。")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# 第3步集合文件
import time, glob, os, shutil, re
os.makedirs('gpt_log/generated_english_version', exist_ok=True)
os.makedirs('gpt_log/generated_english_version/crazy_functions', exist_ok=True)
file_manifest = [f for f in glob.glob('./*.py') if ('test_project' not in f) and ('gpt_log' not in f)] + \
[f for f in glob.glob('./crazy_functions/*.py') if ('test_project' not in f) and ('gpt_log' not in f)]
# file_manifest = ['./toolbox.py']
i_say_show_user_buffer = []
# 第4步随便显示点什么防止卡顿的感觉
for index, fp in enumerate(file_manifest):
# if 'test_project' in fp: continue
with open(fp, 'r', encoding='utf-8', errors='replace') as f:
file_content = f.read()
i_say_show_user =f'[{index}/{len(file_manifest)}] 接下来请将以下代码中包含的所有中文转化为英文,只输出转化后的英文代码,请用代码块输出代码: {os.path.abspath(fp)}'
i_say_show_user_buffer.append(i_say_show_user)
chatbot.append((i_say_show_user, "[Local Message] 等待多线程操作,中间过程不予显示."))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# 第5步Token限制下的截断与处理
MAX_TOKEN = 3000
from request_llm.bridge_all import model_info
enc = model_info["gpt-3.5-turbo"]['tokenizer']
def get_token_fn(txt): return len(enc.encode(txt, disallowed_special=()))
# 第6步任务函数
mutable_return = [None for _ in file_manifest]
observe_window = [[""] for _ in file_manifest]
def thread_worker(fp,index):
if index > 10:
time.sleep(60)
print('Openai 限制免费用户每分钟20次请求,降低请求频率中。')
with open(fp, 'r', encoding='utf-8', errors='replace') as f:
file_content = f.read()
i_say_template = lambda fp, file_content: f'接下来请将以下代码中包含的所有中文转化为英文,只输出代码,文件名是{fp},文件代码是 ```{file_content}```'
try:
gpt_say = ""
# 分解代码文件
file_content_breakdown = breakdown_txt_to_satisfy_token_limit(file_content, get_token_fn, MAX_TOKEN)
for file_content_partial in file_content_breakdown:
i_say = i_say_template(fp, file_content_partial)
# # ** gpt request **
gpt_say_partial = predict_no_ui_long_connection(inputs=i_say, llm_kwargs=llm_kwargs, history=[], sys_prompt=sys_prompt, observe_window=observe_window[index])
gpt_say_partial = extract_code_block_carefully(gpt_say_partial)
gpt_say += gpt_say_partial
mutable_return[index] = gpt_say
except ConnectionAbortedError as token_exceed_err:
print('至少一个线程任务Token溢出而失败', e)
except Exception as e:
print('至少一个线程任务意外失败', e)
# 第7步所有线程同时开始执行任务函数
handles = [threading.Thread(target=thread_worker, args=(fp,index)) for index, fp in enumerate(file_manifest)]
for h in handles:
h.daemon = True
h.start()
chatbot.append(('开始了吗?', f'多线程操作已经开始'))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# 第8步循环轮询各个线程是否执行完毕
cnt = 0
while True:
cnt += 1
time.sleep(0.2)
th_alive = [h.is_alive() for h in handles]
if not any(th_alive): break
# 更好的UI视觉效果
observe_win = []
for thread_index, alive in enumerate(th_alive):
observe_win.append("[ ..."+observe_window[thread_index][0][-60:].replace('\n','').replace('```','...').replace(' ','.').replace('<br/>','.....').replace('$','.')+"... ]")
stat = [f'执行中: {obs}\n\n' if alive else '已完成\n\n' for alive, obs in zip(th_alive, observe_win)]
stat_str = ''.join(stat)
chatbot[-1] = (chatbot[-1][0], f'多线程操作已经开始,完成情况: \n\n{stat_str}' + ''.join(['.']*(cnt%10+1)))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# 第9步把结果写入文件
for index, h in enumerate(handles):
h.join() # 这里其实不需要join了,肯定已经都结束了
fp = file_manifest[index]
gpt_say = mutable_return[index]
i_say_show_user = i_say_show_user_buffer[index]
where_to_relocate = f'gpt_log/generated_english_version/{fp}'
if gpt_say is not None:
with open(where_to_relocate, 'w+', encoding='utf-8') as f:
f.write(gpt_say)
else: # 失败
shutil.copyfile(file_manifest[index], where_to_relocate)
chatbot.append((i_say_show_user, f'[Local Message] 已完成{os.path.abspath(fp)}的转化,\n\n存入{os.path.abspath(where_to_relocate)}'))
history.append(i_say_show_user); history.append(gpt_say)
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
time.sleep(1)
# 第10步备份一个文件
res = write_results_to_file(history)
chatbot.append(("生成一份任务执行报告", res))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面

查看文件

@@ -0,0 +1,252 @@
# 本源代码中, ⭐ = 关键步骤
"""
测试:
- 裁剪图像,保留下半部分
- 交换图像的蓝色通道和红色通道
- 将图像转为灰度图像
- 将csv文件转excel表格
Testing:
- Crop the image, keeping the bottom half.
- Swap the blue channel and red channel of the image.
- Convert the image to grayscale.
- Convert the CSV file to an Excel spreadsheet.
"""
from toolbox import CatchException, update_ui, gen_time_str, trimmed_format_exc, is_the_upload_folder
from toolbox import promote_file_to_downloadzone, get_log_folder, update_ui_lastest_msg
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive, get_plugin_arg
from .crazy_utils import input_clipping, try_install_deps
from crazy_functions.gen_fns.gen_fns_shared import is_function_successfully_generated
from crazy_functions.gen_fns.gen_fns_shared import get_class_name
from crazy_functions.gen_fns.gen_fns_shared import subprocess_worker
from crazy_functions.gen_fns.gen_fns_shared import try_make_module
import os
import time
import glob
import multiprocessing
templete = """
```python
import ... # Put dependencies here, e.g. import numpy as np.
class TerminalFunction(object): # Do not change the name of the class, The name of the class must be `TerminalFunction`
def run(self, path): # The name of the function must be `run`, it takes only a positional argument.
# rewrite the function you have just written here
...
return generated_file_path
```
"""
def inspect_dependency(chatbot, history):
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return True
def get_code_block(reply):
import re
pattern = r"```([\s\S]*?)```" # regex pattern to match code blocks
matches = re.findall(pattern, reply) # find all code blocks in text
if len(matches) == 1:
return matches[0].strip('python') # code block
for match in matches:
if 'class TerminalFunction' in match:
return match.strip('python') # code block
raise RuntimeError("GPT is not generating proper code.")
def gpt_interact_multi_step(txt, file_type, llm_kwargs, chatbot, history):
# 输入
prompt_compose = [
f'Your job:\n'
f'1. write a single Python function, which takes a path of a `{file_type}` file as the only argument and returns a `string` containing the result of analysis or the path of generated files. \n',
f"2. You should write this function to perform following task: " + txt + "\n",
f"3. Wrap the output python function with markdown codeblock."
]
i_say = "".join(prompt_compose)
demo = []
# 第一步
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
inputs=i_say, inputs_show_user=i_say,
llm_kwargs=llm_kwargs, chatbot=chatbot, history=demo,
sys_prompt= r"You are a world-class programmer."
)
history.extend([i_say, gpt_say])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 界面更新
# 第二步
prompt_compose = [
"If previous stage is successful, rewrite the function you have just written to satisfy following templete: \n",
templete
]
i_say = "".join(prompt_compose); inputs_show_user = "If previous stage is successful, rewrite the function you have just written to satisfy executable templete. "
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
inputs=i_say, inputs_show_user=inputs_show_user,
llm_kwargs=llm_kwargs, chatbot=chatbot, history=history,
sys_prompt= r"You are a programmer. You need to replace `...` with valid packages, do not give `...` in your answer!"
)
code_to_return = gpt_say
history.extend([i_say, gpt_say])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 界面更新
# # 第三步
# i_say = "Please list to packages to install to run the code above. Then show me how to use `try_install_deps` function to install them."
# i_say += 'For instance. `try_install_deps(["opencv-python", "scipy", "numpy"])`'
# installation_advance = yield from request_gpt_model_in_new_thread_with_ui_alive(
# inputs=i_say, inputs_show_user=inputs_show_user,
# llm_kwargs=llm_kwargs, chatbot=chatbot, history=history,
# sys_prompt= r"You are a programmer."
# )
# # # 第三步
# i_say = "Show me how to use `pip` to install packages to run the code above. "
# i_say += 'For instance. `pip install -r opencv-python scipy numpy`'
# installation_advance = yield from request_gpt_model_in_new_thread_with_ui_alive(
# inputs=i_say, inputs_show_user=i_say,
# llm_kwargs=llm_kwargs, chatbot=chatbot, history=history,
# sys_prompt= r"You are a programmer."
# )
installation_advance = ""
return code_to_return, installation_advance, txt, file_type, llm_kwargs, chatbot, history
def for_immediate_show_off_when_possible(file_type, fp, chatbot):
if file_type in ['png', 'jpg']:
image_path = os.path.abspath(fp)
chatbot.append(['这是一张图片, 展示如下:',
f'本地文件地址: <br/>`{image_path}`<br/>'+
f'本地文件预览: <br/><div align="center"><img src="file={image_path}"></div>'
])
return chatbot
def have_any_recent_upload_files(chatbot):
_5min = 5 * 60
if not chatbot: return False # chatbot is None
most_recent_uploaded = chatbot._cookies.get("most_recent_uploaded", None)
if not most_recent_uploaded: return False # most_recent_uploaded is None
if time.time() - most_recent_uploaded["time"] < _5min: return True # most_recent_uploaded is new
else: return False # most_recent_uploaded is too old
def get_recent_file_prompt_support(chatbot):
most_recent_uploaded = chatbot._cookies.get("most_recent_uploaded", None)
path = most_recent_uploaded['path']
return path
@CatchException
def 函数动态生成(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
"""
txt 输入栏用户输入的文本,例如需要翻译的一段话,再例如一个包含了待处理文件的路径
llm_kwargs gpt模型参数,如温度和top_p等,一般原样传递下去就行
plugin_kwargs 插件模型的参数,暂时没有用武之地
chatbot 聊天显示框的句柄,用于显示给用户
history 聊天历史,前情提要
system_prompt 给gpt的静默提醒
web_port 当前软件运行的端口号
"""
# 清空历史
history = []
# 基本信息:功能、贡献者
chatbot.append(["正在启动: 插件动态生成插件", "插件动态生成, 执行开始, 作者Binary-Husky."])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# ⭐ 文件上传区是否有东西
# 1. 如果有文件: 作为函数参数
# 2. 如果没有文件需要用GPT提取参数 (太懒了,以后再写,虚空终端已经实现了类似的代码)
file_list = []
if get_plugin_arg(plugin_kwargs, key="file_path_arg", default=False):
file_path = get_plugin_arg(plugin_kwargs, key="file_path_arg", default=None)
file_list.append(file_path)
yield from update_ui_lastest_msg(f"当前文件: {file_path}", chatbot, history, 1)
elif have_any_recent_upload_files(chatbot):
file_dir = get_recent_file_prompt_support(chatbot)
file_list = glob.glob(os.path.join(file_dir, '**/*'), recursive=True)
yield from update_ui_lastest_msg(f"当前文件处理列表: {file_list}", chatbot, history, 1)
else:
chatbot.append(["文件检索", "没有发现任何近期上传的文件。"])
yield from update_ui_lastest_msg("没有发现任何近期上传的文件。", chatbot, history, 1)
return # 2. 如果没有文件
if len(file_list) == 0:
chatbot.append(["文件检索", "没有发现任何近期上传的文件。"])
yield from update_ui_lastest_msg("没有发现任何近期上传的文件。", chatbot, history, 1)
return # 2. 如果没有文件
# 读取文件
file_type = file_list[0].split('.')[-1]
# 粗心检查
if is_the_upload_folder(txt):
yield from update_ui_lastest_msg(f"请在输入框内填写需求, 然后再次点击该插件! 至于您的文件,不用担心, 文件路径 {txt} 已经被记忆. ", chatbot, history, 1)
return
# 开始干正事
MAX_TRY = 3
for j in range(MAX_TRY): # 最多重试5次
traceback = ""
try:
# ⭐ 开始啦
code, installation_advance, txt, file_type, llm_kwargs, chatbot, history = \
yield from gpt_interact_multi_step(txt, file_type, llm_kwargs, chatbot, history)
chatbot.append(["代码生成阶段结束", ""])
yield from update_ui_lastest_msg(f"正在验证上述代码的有效性 ...", chatbot, history, 1)
# ⭐ 分离代码块
code = get_code_block(code)
# ⭐ 检查模块
ok, traceback = try_make_module(code, chatbot)
# 搞定代码生成
if ok: break
except Exception as e:
if not traceback: traceback = trimmed_format_exc()
# 处理异常
if not traceback: traceback = trimmed_format_exc()
yield from update_ui_lastest_msg(f"{j+1}/{MAX_TRY} 次代码生成尝试, 失败了~ 别担心, 我们5秒后再试一次... \n\n此次我们的错误追踪是\n```\n{traceback}\n```\n", chatbot, history, 5)
# 代码生成结束, 开始执行
TIME_LIMIT = 15
yield from update_ui_lastest_msg(f"开始创建新进程并执行代码! 时间限制 {TIME_LIMIT} 秒. 请等待任务完成... ", chatbot, history, 1)
manager = multiprocessing.Manager()
return_dict = manager.dict()
# ⭐ 到最后一步了,开始逐个文件进行处理
for file_path in file_list:
if os.path.exists(file_path):
chatbot.append([f"正在处理文件: {file_path}", f"请稍等..."])
chatbot = for_immediate_show_off_when_possible(file_type, file_path, chatbot)
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 界面更新
else:
continue
# ⭐⭐⭐ subprocess_worker ⭐⭐⭐
p = multiprocessing.Process(target=subprocess_worker, args=(code, file_path, return_dict))
# ⭐ 开始执行,时间限制TIME_LIMIT
p.start(); p.join(timeout=TIME_LIMIT)
if p.is_alive(): p.terminate(); p.join()
p.close()
res = return_dict['result']
success = return_dict['success']
traceback = return_dict['traceback']
if not success:
if not traceback: traceback = trimmed_format_exc()
chatbot.append(["执行失败了", f"错误追踪\n```\n{trimmed_format_exc()}\n```\n"])
# chatbot.append(["如果是缺乏依赖,请参考以下建议", installation_advance])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# 顺利完成,收尾
res = str(res)
if os.path.exists(res):
chatbot.append(["执行成功了,结果是一个有效文件", "结果:" + res])
new_file_path = promote_file_to_downloadzone(res, chatbot=chatbot)
chatbot = for_immediate_show_off_when_possible(file_type, new_file_path, chatbot)
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 界面更新
else:
chatbot.append(["执行成功了,结果是一个字符串", "结果:" + res])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 界面更新

查看文件

@@ -0,0 +1,31 @@
from toolbox import CatchException, update_ui, gen_time_str
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from .crazy_utils import input_clipping
import copy, json
@CatchException
def 命令行助手(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
"""
txt 输入栏用户输入的文本, 例如需要翻译的一段话, 再例如一个包含了待处理文件的路径
llm_kwargs gpt模型参数, 如温度和top_p等, 一般原样传递下去就行
plugin_kwargs 插件模型的参数, 暂时没有用武之地
chatbot 聊天显示框的句柄, 用于显示给用户
history 聊天历史, 前情提要
system_prompt 给gpt的静默提醒
web_port 当前软件运行的端口号
"""
# 清空历史, 以免输入溢出
history = []
# 输入
i_say = "请写bash命令实现以下功能" + txt
# 开始
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
inputs=i_say, inputs_show_user=txt,
llm_kwargs=llm_kwargs, chatbot=chatbot, history=[],
sys_prompt="你是一个Linux大师级用户。注意,当我要求你写bash命令时,尽可能地仅用一行命令解决我的要求。"
)
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 界面更新

查看文件

@@ -0,0 +1,69 @@
from toolbox import CatchException, update_ui, get_conf, select_api_key, get_log_folder
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
import datetime
def gen_image(llm_kwargs, prompt, resolution="256x256"):
import requests, json, time, os
from request_llm.bridge_all import model_info
proxies, = get_conf('proxies')
# Set up OpenAI API key and model
api_key = select_api_key(llm_kwargs['api_key'], llm_kwargs['llm_model'])
chat_endpoint = model_info[llm_kwargs['llm_model']]['endpoint']
# 'https://api.openai.com/v1/chat/completions'
img_endpoint = chat_endpoint.replace('chat/completions','images/generations')
# # Generate the image
url = img_endpoint
headers = {
'Authorization': f"Bearer {api_key}",
'Content-Type': 'application/json'
}
data = {
'prompt': prompt,
'n': 1,
'size': resolution,
'response_format': 'url'
}
response = requests.post(url, headers=headers, json=data, proxies=proxies)
print(response.content)
try:
image_url = json.loads(response.content.decode('utf8'))['data'][0]['url']
except:
raise RuntimeError(response.content.decode())
# 文件保存到本地
r = requests.get(image_url, proxies=proxies)
file_path = f'{get_log_folder()}/image_gen/'
os.makedirs(file_path, exist_ok=True)
file_name = 'Image' + time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime()) + '.png'
with open(file_path+file_name, 'wb+') as f: f.write(r.content)
return image_url, file_path+file_name
@CatchException
def 图片生成(prompt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
"""
txt 输入栏用户输入的文本,例如需要翻译的一段话,再例如一个包含了待处理文件的路径
llm_kwargs gpt模型参数,如温度和top_p等,一般原样传递下去就行
plugin_kwargs 插件模型的参数,暂时没有用武之地
chatbot 聊天显示框的句柄,用于显示给用户
history 聊天历史,前情提要
system_prompt 给gpt的静默提醒
web_port 当前软件运行的端口号
"""
history = [] # 清空历史,以免输入溢出
chatbot.append(("这是什么功能?", "[Local Message] 生成图像, 请先把模型切换至gpt-*或者api2d-*。如果中文效果不理想, 请尝试英文Prompt。正在处理中 ....."))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 由于请求gpt需要一段时间,我们先及时地做一次界面更新
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
resolution = plugin_kwargs.get("advanced_arg", '256x256')
image_url, image_path = gen_image(llm_kwargs, prompt, resolution)
chatbot.append([prompt,
f'图像中转网址: <br/>`{image_url}`<br/>'+
f'中转网址预览: <br/><div align="center"><img src="{image_url}"></div>'
f'本地文件地址: <br/>`{image_path}`<br/>'+
f'本地文件预览: <br/><div align="center"><img src="file={image_path}"></div>'
])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 界面更新

查看文件

@@ -1,4 +1,4 @@
from toolbox import CatchException, update_ui
from toolbox import CatchException, update_ui, promote_file_to_downloadzone, get_log_folder
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
import re
@@ -10,9 +10,9 @@ def write_chat_to_file(chatbot, history=None, file_name=None):
import time
if file_name is None:
file_name = 'chatGPT对话历史' + time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime()) + '.html'
os.makedirs('./gpt_log/', exist_ok=True)
with open(f'./gpt_log/{file_name}', 'w', encoding='utf8') as f:
from theme import advanced_css
fp = os.path.join(get_log_folder(), file_name)
with open(fp, 'w', encoding='utf8') as f:
from themes.theme import advanced_css
f.write(f'<!DOCTYPE html><head><meta charset="utf-8"><title>对话历史</title><style>{advanced_css}</style></head>')
for i, contents in enumerate(chatbot):
for j, content in enumerate(contents):
@@ -29,9 +29,8 @@ def write_chat_to_file(chatbot, history=None, file_name=None):
for h in history:
f.write("\n>>>" + h)
f.write('</code>')
res = '对话历史写入:' + os.path.abspath(f'./gpt_log/{file_name}')
print(res)
return res
promote_file_to_downloadzone(fp, rename_file=file_name, chatbot=chatbot)
return '对话历史写入:' + fp
def gen_file_preview(file_name):
try:
@@ -107,7 +106,7 @@ def 载入对话历史存档(txt, llm_kwargs, plugin_kwargs, chatbot, history, s
if not success:
if txt == "": txt = '空空如也的输入栏'
import glob
local_history = "<br/>".join(["`"+hide_cwd(f)+f" ({gen_file_preview(f)})"+"`" for f in glob.glob(f'gpt_log/**/chatGPT对话历史*.html', recursive=True)])
local_history = "<br/>".join(["`"+hide_cwd(f)+f" ({gen_file_preview(f)})"+"`" for f in glob.glob(f'{get_log_folder()}/**/chatGPT对话历史*.html', recursive=True)])
chatbot.append([f"正在查找对话历史文件html格式: {txt}", f"找不到任何html文件: {txt}。但本地存储了以下历史文件,您可以将任意一个文件路径粘贴到输入区,然后重试:<br/>{local_history}"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
@@ -133,8 +132,8 @@ def 删除所有本地对话历史记录(txt, llm_kwargs, plugin_kwargs, chatbot
"""
import glob, os
local_history = "<br/>".join(["`"+hide_cwd(f)+"`" for f in glob.glob(f'gpt_log/**/chatGPT对话历史*.html', recursive=True)])
for f in glob.glob(f'gpt_log/**/chatGPT对话历史*.html', recursive=True):
local_history = "<br/>".join(["`"+hide_cwd(f)+"`" for f in glob.glob(f'{get_log_folder()}/**/chatGPT对话历史*.html', recursive=True)])
for f in glob.glob(f'{get_log_folder()}/**/chatGPT对话历史*.html', recursive=True):
os.remove(f)
chatbot.append([f"删除所有历史对话文件", f"已删除<br/>{local_history}"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面

查看文件

@@ -1,5 +1,6 @@
from toolbox import update_ui
from toolbox import CatchException, report_execption, write_results_to_file
from toolbox import CatchException, report_execption
from toolbox import write_history_to_file, promote_file_to_downloadzone
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
fast_debug = False
@@ -14,17 +15,19 @@ def 解析docx(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot
doc = Document(fp)
file_content = "\n".join([para.text for para in doc.paragraphs])
else:
import win32com.client
word = win32com.client.Dispatch("Word.Application")
word.visible = False
# 打开文件
print('fp', os.getcwd())
doc = word.Documents.Open(os.getcwd() + '/' + fp)
# file_content = doc.Content.Text
doc = word.ActiveDocument
file_content = doc.Range().Text
doc.Close()
word.Quit()
try:
import win32com.client
word = win32com.client.Dispatch("Word.Application")
word.visible = False
# 打开文件
doc = word.Documents.Open(os.getcwd() + '/' + fp)
# file_content = doc.Content.Text
doc = word.ActiveDocument
file_content = doc.Range().Text
doc.Close()
word.Quit()
except:
raise RuntimeError('请先将.doc文档转换为.docx文档。')
print(file_content)
# private_upload里面的文件名在解压zip后容易出现乱码rar和7z格式正常,故可以只分析文章内容,不输入文件名
@@ -69,11 +72,13 @@ def 解析docx(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot
history.extend([i_say,gpt_say])
this_paper_history.extend([i_say,gpt_say])
res = write_results_to_file(history)
res = write_history_to_file(history)
promote_file_to_downloadzone(res, chatbot=chatbot)
chatbot.append(("完成了吗?", res))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
res = write_results_to_file(history)
res = write_history_to_file(history)
promote_file_to_downloadzone(res, chatbot=chatbot)
chatbot.append(("所有文件都总结完成了吗?", res))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
@@ -85,7 +90,7 @@ def 总结word文档(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_pr
# 基本信息:功能、贡献者
chatbot.append([
"函数插件功能?",
"批量总结Word文档。函数插件贡献者: JasonGuo1"])
"批量总结Word文档。函数插件贡献者: JasonGuo1。注意, 如果是.doc文件, 请先转化为.docx格式。"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# 尝试导入依赖,如果缺少依赖,则给出安装建议

查看文件

@@ -0,0 +1,186 @@
from toolbox import CatchException, report_execption, select_api_key, update_ui, get_conf
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from toolbox import write_history_to_file, promote_file_to_downloadzone, get_log_folder
def split_audio_file(filename, split_duration=1000):
"""
根据给定的切割时长将音频文件切割成多个片段。
Args:
filename (str): 需要被切割的音频文件名。
split_duration (int, optional): 每个切割音频片段的时长以秒为单位。默认值为1000。
Returns:
filelist (list): 一个包含所有切割音频片段文件路径的列表。
"""
from moviepy.editor import AudioFileClip
import os
os.makedirs(f"{get_log_folder(plugin_name='audio')}/mp3/cut/", exist_ok=True) # 创建存储切割音频的文件夹
# 读取音频文件
audio = AudioFileClip(filename)
# 计算文件总时长和切割点
total_duration = audio.duration
split_points = list(range(0, int(total_duration), split_duration))
split_points.append(int(total_duration))
filelist = []
# 切割音频文件
for i in range(len(split_points) - 1):
start_time = split_points[i]
end_time = split_points[i + 1]
split_audio = audio.subclip(start_time, end_time)
split_audio.write_audiofile(f"{get_log_folder(plugin_name='audio')}/mp3/cut/{filename[0]}_{i}.mp3")
filelist.append(f"{get_log_folder(plugin_name='audio')}/mp3/cut/{filename[0]}_{i}.mp3")
audio.close()
return filelist
def AnalyAudio(parse_prompt, file_manifest, llm_kwargs, chatbot, history):
import os, requests
from moviepy.editor import AudioFileClip
from request_llm.bridge_all import model_info
# 设置OpenAI密钥和模型
api_key = select_api_key(llm_kwargs['api_key'], llm_kwargs['llm_model'])
chat_endpoint = model_info[llm_kwargs['llm_model']]['endpoint']
whisper_endpoint = chat_endpoint.replace('chat/completions', 'audio/transcriptions')
url = whisper_endpoint
headers = {
'Authorization': f"Bearer {api_key}"
}
os.makedirs(f"{get_log_folder(plugin_name='audio')}/mp3/", exist_ok=True)
for index, fp in enumerate(file_manifest):
audio_history = []
# 提取文件扩展名
ext = os.path.splitext(fp)[1]
# 提取视频中的音频
if ext not in [".mp3", ".wav", ".m4a", ".mpga"]:
audio_clip = AudioFileClip(fp)
audio_clip.write_audiofile(f"{get_log_folder(plugin_name='audio')}/mp3/output{index}.mp3")
fp = f"{get_log_folder(plugin_name='audio')}/mp3/output{index}.mp3"
# 调用whisper模型音频转文字
voice = split_audio_file(fp)
for j, i in enumerate(voice):
with open(i, 'rb') as f:
file_content = f.read() # 读取文件内容到内存
files = {
'file': (os.path.basename(i), file_content),
}
data = {
"model": "whisper-1",
"prompt": parse_prompt,
'response_format': "text"
}
chatbot.append([f"{i} 发送到openai音频解析终端 (whisper),当前参数:{parse_prompt}", "正在处理 ..."])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
proxies, = get_conf('proxies')
response = requests.post(url, headers=headers, files=files, data=data, proxies=proxies).text
chatbot.append(["音频解析结果", response])
history.extend(["音频解析结果", response])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
i_say = f'请对下面的音频片段做概述,音频内容是 ```{response}```'
i_say_show_user = f'{index + 1}段音频的第{j + 1} / {len(voice)}片段。'
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
inputs=i_say,
inputs_show_user=i_say_show_user,
llm_kwargs=llm_kwargs,
chatbot=chatbot,
history=[],
sys_prompt=f"总结音频。音频文件名{fp}"
)
chatbot[-1] = (i_say_show_user, gpt_say)
history.extend([i_say_show_user, gpt_say])
audio_history.extend([i_say_show_user, gpt_say])
# 已经对该文章的所有片段总结完毕,如果文章被切分了
result = "".join(audio_history)
if len(audio_history) > 1:
i_say = f"根据以上的对话,使用中文总结音频“{result}”的主要内容。"
i_say_show_user = f'{index + 1}段音频的主要内容:'
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
inputs=i_say,
inputs_show_user=i_say_show_user,
llm_kwargs=llm_kwargs,
chatbot=chatbot,
history=audio_history,
sys_prompt="总结文章。"
)
history.extend([i_say, gpt_say])
audio_history.extend([i_say, gpt_say])
res = write_history_to_file(history)
promote_file_to_downloadzone(res, chatbot=chatbot)
chatbot.append((f"{index + 1}段音频完成了吗?", res))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# 删除中间文件夹
import shutil
shutil.rmtree(f"{get_log_folder(plugin_name='audio')}/mp3")
res = write_history_to_file(history)
promote_file_to_downloadzone(res, chatbot=chatbot)
chatbot.append(("所有音频都总结完成了吗?", res))
yield from update_ui(chatbot=chatbot, history=history)
@CatchException
def 总结音视频(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, WEB_PORT):
import glob, os
# 基本信息:功能、贡献者
chatbot.append([
"函数插件功能?",
"总结音视频内容,函数插件贡献者: dalvqw & BinaryHusky"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
try:
from moviepy.editor import AudioFileClip
except:
report_execption(chatbot, history,
a=f"解析项目: {txt}",
b=f"导入软件依赖失败。使用该模块需要额外依赖,安装方法```pip install --upgrade moviepy```。")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# 清空历史,以免输入溢出
history = []
# 检测输入参数,如没有给定输入参数,直接退出
if os.path.exists(txt):
project_folder = txt
else:
if txt == "": txt = '空空如也的输入栏'
report_execption(chatbot, history, a=f"解析项目: {txt}", b=f"找不到本地项目或无权访问: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# 搜索需要处理的文件清单
extensions = ['.mp4', '.m4a', '.wav', '.mpga', '.mpeg', '.mp3', '.avi', '.mkv', '.flac', '.aac']
if txt.endswith(tuple(extensions)):
file_manifest = [txt]
else:
file_manifest = []
for extension in extensions:
file_manifest.extend(glob.glob(f'{project_folder}/**/*{extension}', recursive=True))
# 如果没找到任何文件
if len(file_manifest) == 0:
report_execption(chatbot, history, a=f"解析项目: {txt}", b=f"找不到任何音频或视频文件: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# 开始正式执行任务
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
parse_prompt = plugin_kwargs.get("advanced_arg", '将音频解析为简体中文')
yield from AnalyAudio(parse_prompt, file_manifest, llm_kwargs, chatbot, history)
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面

查看文件

@@ -1,5 +1,7 @@
from toolbox import update_ui
from toolbox import CatchException, report_execption, write_results_to_file
import glob, time, os, re, logging
from toolbox import update_ui, trimmed_format_exc, gen_time_str, disable_auto_promotion
from toolbox import CatchException, report_execption, get_log_folder
from toolbox import write_history_to_file, promote_file_to_downloadzone
fast_debug = False
class PaperFileGroup():
@@ -32,11 +34,23 @@ class PaperFileGroup():
self.sp_file_contents.append(segment)
self.sp_file_index.append(index)
self.sp_file_tag.append(self.file_paths[index] + f".part-{j}.md")
logging.info('Segmentation: done')
print('Segmentation: done')
def merge_result(self):
self.file_result = ["" for _ in range(len(self.file_paths))]
for r, k in zip(self.sp_file_result, self.sp_file_index):
self.file_result[k] += r
def write_result(self, language):
manifest = []
for path, res in zip(self.file_paths, self.file_result):
dst_file = os.path.join(get_log_folder(), f'{gen_time_str()}.md')
with open(dst_file, 'w', encoding='utf8') as f:
manifest.append(dst_file)
f.write(res)
return manifest
def 多文件翻译(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, language='en'):
import time, os, re
from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
# <-------- 读取Markdown文件,删除其中的所有注释 ---------->
@@ -53,7 +67,7 @@ def 多文件翻译(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
pfg.run_file_split(max_token_limit=1500)
n_split = len(pfg.sp_file_contents)
# <-------- 多线程润色开始 ---------->
# <-------- 多线程翻译开始 ---------->
if language == 'en->zh':
inputs_array = ["This is a Markdown file, translate it into Chinese, do not modify any existing Markdown commands:" +
f"\n\n{frag}" for frag in pfg.sp_file_contents]
@@ -64,6 +78,11 @@ def 多文件翻译(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
f"\n\n{frag}" for frag in pfg.sp_file_contents]
inputs_show_user_array = [f"翻译 {f}" for f in pfg.sp_file_tag]
sys_prompt_array = ["You are a professional academic paper translator." for _ in range(n_split)]
else:
inputs_array = [f"This is a Markdown file, translate it into {language}, do not modify any existing Markdown commands, only answer me with translated results:" +
f"\n\n{frag}" for frag in pfg.sp_file_contents]
inputs_show_user_array = [f"翻译 {f}" for f in pfg.sp_file_tag]
sys_prompt_array = ["You are a professional academic paper translator." for _ in range(n_split)]
gpt_response_collection = yield from request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency(
inputs_array=inputs_array,
@@ -75,30 +94,48 @@ def 多文件翻译(file_manifest, project_folder, llm_kwargs, plugin_kwargs, ch
# max_workers=5, # OpenAI所允许的最大并行过载
scroller_max_len = 80
)
try:
pfg.sp_file_result = []
for i_say, gpt_say in zip(gpt_response_collection[0::2], gpt_response_collection[1::2]):
pfg.sp_file_result.append(gpt_say)
pfg.merge_result()
pfg.write_result(language)
except:
logging.error(trimmed_format_exc())
# <-------- 整理结果,退出 ---------->
create_report_file_name = time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime()) + f"-chatgpt.polish.md"
res = write_results_to_file(gpt_response_collection, file_name=create_report_file_name)
create_report_file_name = gen_time_str() + f"-chatgpt.md"
res = write_history_to_file(gpt_response_collection, file_basename=create_report_file_name)
promote_file_to_downloadzone(res, chatbot=chatbot)
history = gpt_response_collection
chatbot.append((f"{fp}完成了吗?", res))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
def get_files_from_everything(txt):
import glob, os
def get_files_from_everything(txt, preference=''):
if txt == "": return False, None, None
success = True
if txt.startswith('http'):
# 网络的远程文件
txt = txt.replace("https://github.com/", "https://raw.githubusercontent.com/")
txt = txt.replace("/blob/", "/")
import requests
from toolbox import get_conf
proxies, = get_conf('proxies')
# 网络的远程文件
if preference == 'Github':
logging.info('正在从github下载资源 ...')
if not txt.endswith('.md'):
# Make a request to the GitHub API to retrieve the repository information
url = txt.replace("https://github.com/", "https://api.github.com/repos/") + '/readme'
response = requests.get(url, proxies=proxies)
txt = response.json()['download_url']
else:
txt = txt.replace("https://github.com/", "https://raw.githubusercontent.com/")
txt = txt.replace("/blob/", "/")
r = requests.get(txt, proxies=proxies)
with open('./gpt_log/temp.md', 'wb+') as f: f.write(r.content)
project_folder = './gpt_log/'
file_manifest = ['./gpt_log/temp.md']
download_local = f'{get_log_folder(plugin_name="批量Markdown翻译")}/raw-readme-{gen_time_str()}.md'
project_folder = f'{get_log_folder(plugin_name="批量Markdown翻译")}'
with open(download_local, 'wb+') as f: f.write(r.content)
file_manifest = [download_local]
elif txt.endswith('.md'):
# 直接给定文件
file_manifest = [txt]
@@ -108,6 +145,8 @@ def get_files_from_everything(txt):
project_folder = txt
file_manifest = [f for f in glob.glob(f'{project_folder}/**/*.md', recursive=True)]
else:
project_folder = None
file_manifest = []
success = False
return success, file_manifest, project_folder
@@ -120,11 +159,11 @@ def Markdown英译中(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_p
"函数插件功能?",
"对整个Markdown项目进行翻译。函数插件贡献者: Binary-Husky"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
disable_auto_promotion(chatbot)
# 尝试导入依赖,如果缺少依赖,则给出安装建议
try:
import tiktoken
import glob, os
except:
report_execption(chatbot, history,
a=f"解析项目: {txt}",
@@ -133,7 +172,7 @@ def Markdown英译中(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_p
return
history = [] # 清空历史,以免输入溢出
success, file_manifest, project_folder = get_files_from_everything(txt)
success, file_manifest, project_folder = get_files_from_everything(txt, preference="Github")
if not success:
# 什么都没有
@@ -160,11 +199,11 @@ def Markdown中译英(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_p
"函数插件功能?",
"对整个Markdown项目进行翻译。函数插件贡献者: Binary-Husky"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
disable_auto_promotion(chatbot)
# 尝试导入依赖,如果缺少依赖,则给出安装建议
try:
import tiktoken
import glob, os
except:
report_execption(chatbot, history,
a=f"解析项目: {txt}",
@@ -183,4 +222,40 @@ def Markdown中译英(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_p
report_execption(chatbot, history, a = f"解析项目: {txt}", b = f"找不到任何.md文件: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
yield from 多文件翻译(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, language='zh->en')
yield from 多文件翻译(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, language='zh->en')
@CatchException
def Markdown翻译指定语言(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
# 基本信息:功能、贡献者
chatbot.append([
"函数插件功能?",
"对整个Markdown项目进行翻译。函数插件贡献者: Binary-Husky"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
disable_auto_promotion(chatbot)
# 尝试导入依赖,如果缺少依赖,则给出安装建议
try:
import tiktoken
except:
report_execption(chatbot, history,
a=f"解析项目: {txt}",
b=f"导入软件依赖失败。使用该模块需要额外依赖,安装方法```pip install --upgrade tiktoken```。")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
history = [] # 清空历史,以免输入溢出
success, file_manifest, project_folder = get_files_from_everything(txt)
if not success:
# 什么都没有
if txt == "": txt = '空空如也的输入栏'
report_execption(chatbot, history, a = f"解析项目: {txt}", b = f"找不到本地项目或无权访问: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
if len(file_manifest) == 0:
report_execption(chatbot, history, a = f"解析项目: {txt}", b = f"找不到任何.md文件: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
language = plugin_kwargs.get("advanced_arg", 'Chinese')
yield from 多文件翻译(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, language=language)

查看文件

@@ -1,121 +1,108 @@
from toolbox import update_ui
from toolbox import CatchException, report_execption, write_results_to_file
import re
import unicodedata
fast_debug = False
from toolbox import update_ui, promote_file_to_downloadzone, gen_time_str
from toolbox import CatchException, report_execption
from toolbox import write_history_to_file, promote_file_to_downloadzone
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from .crazy_utils import read_and_clean_pdf_text
from .crazy_utils import input_clipping
def is_paragraph_break(match):
"""
根据给定的匹配结果来判断换行符是否表示段落分隔。
如果换行符前为句子结束标志(句号,感叹号,问号),且下一个字符为大写字母,则换行符更有可能表示段落分隔。
也可以根据之前的内容长度来判断段落是否已经足够长。
"""
prev_char, next_char = match.groups()
# 句子结束标志
sentence_endings = ".!?"
# 设定一个最小段落长度阈值
min_paragraph_length = 140
if prev_char in sentence_endings and next_char.isupper() and len(match.string[:match.start(1)]) > min_paragraph_length:
return "\n\n"
else:
return " "
def normalize_text(text):
"""
通过把连字ligatures等文本特殊符号转换为其基本形式来对文本进行归一化处理。
例如,将连字 "fi" 转换为 "f""i"
"""
# 对文本进行归一化处理,分解连字
normalized_text = unicodedata.normalize("NFKD", text)
# 替换其他特殊字符
cleaned_text = re.sub(r'[^\x00-\x7F]+', '', normalized_text)
return cleaned_text
def clean_text(raw_text):
"""
对从 PDF 提取出的原始文本进行清洗和格式化处理。
1. 对原始文本进行归一化处理。
2. 替换跨行的连词,例如 “Espe-\ncially” 转换为 “Especially”。
3. 根据 heuristic 规则判断换行符是否是段落分隔,并相应地进行替换。
"""
# 对文本进行归一化处理
normalized_text = normalize_text(raw_text)
# 替换跨行的连词
text = re.sub(r'(\w+-\n\w+)', lambda m: m.group(1).replace('-\n', ''), normalized_text)
# 根据前后相邻字符的特点,找到原文本中的换行符
newlines = re.compile(r'(\S)\n(\S)')
# 根据 heuristic 规则,用空格或段落分隔符替换原换行符
final_text = re.sub(newlines, lambda m: m.group(1) + is_paragraph_break(m) + m.group(2), text)
return final_text.strip()
def 解析PDF(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
import time, glob, os, fitz
print('begin analysis on:', file_manifest)
for index, fp in enumerate(file_manifest):
with fitz.open(fp) as doc:
file_content = ""
for page in doc:
file_content += page.get_text()
file_content = clean_text(file_content)
print(file_content)
file_write_buffer = []
for file_name in file_manifest:
print('begin analysis on:', file_name)
############################## <第 0 步,切割PDF> ##################################
# 递归地切割PDF文件,每一块尽量是完整的一个section,比如introduction,experiment等,必要时再进行切割
# 的长度必须小于 2500 个 Token
file_content, page_one = read_and_clean_pdf_text(file_name) # 尝试按照章节切割PDF
file_content = file_content.encode('utf-8', 'ignore').decode() # avoid reading non-utf8 chars
page_one = str(page_one).encode('utf-8', 'ignore').decode() # avoid reading non-utf8 chars
TOKEN_LIMIT_PER_FRAGMENT = 2500
prefix = "接下来请你逐文件分析下面的论文文件,概括其内容" if index==0 else ""
i_say = prefix + f'请对下面的文章片段用中文做一个概述,文件名是{os.path.relpath(fp, project_folder)},文章内容是 ```{file_content}```'
i_say_show_user = prefix + f'[{index}/{len(file_manifest)}] 请对下面的文章片段做一个概述: {os.path.abspath(fp)}'
chatbot.append((i_say_show_user, "[Local Message] waiting gpt response."))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
from .crazy_utils import breakdown_txt_to_satisfy_token_limit_for_pdf
from request_llm.bridge_all import model_info
enc = model_info["gpt-3.5-turbo"]['tokenizer']
def get_token_num(txt): return len(enc.encode(txt, disallowed_special=()))
paper_fragments = breakdown_txt_to_satisfy_token_limit_for_pdf(
txt=file_content, get_token_fn=get_token_num, limit=TOKEN_LIMIT_PER_FRAGMENT)
page_one_fragments = breakdown_txt_to_satisfy_token_limit_for_pdf(
txt=str(page_one), get_token_fn=get_token_num, limit=TOKEN_LIMIT_PER_FRAGMENT//4)
# 为了更好的效果,我们剥离Introduction之后的部分如果有
paper_meta = page_one_fragments[0].split('introduction')[0].split('Introduction')[0].split('INTRODUCTION')[0]
############################## <第 1 步,从摘要中提取高价值信息,放到history中> ##################################
final_results = []
final_results.append(paper_meta)
if not fast_debug:
msg = '正常'
# ** gpt request **
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
inputs=i_say,
inputs_show_user=i_say_show_user,
llm_kwargs=llm_kwargs,
chatbot=chatbot,
history=[],
sys_prompt="总结文章。"
) # 带超时倒计时
############################## <第 2 步,迭代地历遍整个文章,提取精炼信息> ##################################
i_say_show_user = f'首先你在中文语境下通读整篇论文。'; gpt_say = "[Local Message] 收到。" # 用户提示
chatbot.append([i_say_show_user, gpt_say]); yield from update_ui(chatbot=chatbot, history=[]) # 更新UI
chatbot[-1] = (i_say_show_user, gpt_say)
history.append(i_say_show_user); history.append(gpt_say)
yield from update_ui(chatbot=chatbot, history=history, msg=msg) # 刷新界面
if not fast_debug: time.sleep(2)
iteration_results = []
last_iteration_result = paper_meta # 初始值是摘要
MAX_WORD_TOTAL = 4096 * 0.7
n_fragment = len(paper_fragments)
if n_fragment >= 20: print('文章极长,不能达到预期效果')
for i in range(n_fragment):
NUM_OF_WORD = MAX_WORD_TOTAL // n_fragment
i_say = f"Read this section, recapitulate the content of this section with less than {NUM_OF_WORD} Chinese characters: {paper_fragments[i]}"
i_say_show_user = f"[{i+1}/{n_fragment}] Read this section, recapitulate the content of this section with less than {NUM_OF_WORD} Chinese characters: {paper_fragments[i][:200]}"
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(i_say, i_say_show_user, # i_say=真正给chatgpt的提问, i_say_show_user=给用户看的提问
llm_kwargs, chatbot,
history=["The main idea of the previous section is?", last_iteration_result], # 迭代上一次的结果
sys_prompt="Extract the main idea of this section with Chinese." # 提示
)
iteration_results.append(gpt_say)
last_iteration_result = gpt_say
all_file = ', '.join([os.path.relpath(fp, project_folder) for index, fp in enumerate(file_manifest)])
i_say = f'根据以上你自己的分析,对全文进行概括,用学术性语言写一段中文摘要,然后再写一段英文摘要(包括{all_file})。'
chatbot.append((i_say, "[Local Message] waiting gpt response."))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
if not fast_debug:
msg = '正常'
# ** gpt request **
############################## <第 3 步,整理history,提取总结> ##################################
final_results.extend(iteration_results)
final_results.append(f'Please conclude this paper discussed above。')
# This prompt is from https://github.com/kaixindelele/ChatPaper/blob/main/chat_paper.py
NUM_OF_WORD = 1000
i_say = """
1. Mark the title of the paper (with Chinese translation)
2. list all the authors' names (use English)
3. mark the first author's affiliation (output Chinese translation only)
4. mark the keywords of this article (use English)
5. link to the paper, Github code link (if available, fill in Github:None if not)
6. summarize according to the following four points.Be sure to use Chinese answers (proper nouns need to be marked in English)
- (1):What is the research background of this article?
- (2):What are the past methods? What are the problems with them? Is the approach well motivated?
- (3):What is the research methodology proposed in this paper?
- (4):On what task and what performance is achieved by the methods in this paper? Can the performance support their goals?
Follow the format of the output that follows:
1. Title: xxx\n\n
2. Authors: xxx\n\n
3. Affiliation: xxx\n\n
4. Keywords: xxx\n\n
5. Urls: xxx or xxx , xxx \n\n
6. Summary: \n\n
- (1):xxx;\n
- (2):xxx;\n
- (3):xxx;\n
- (4):xxx.\n\n
Be sure to use Chinese answers (proper nouns need to be marked in English), statements as concise and academic as possible,
do not have too much repetitive information, numerical values using the original numbers.
"""
# This prompt is from https://github.com/kaixindelele/ChatPaper/blob/main/chat_paper.py
file_write_buffer.extend(final_results)
i_say, final_results = input_clipping(i_say, final_results, max_token_limit=2000)
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
inputs=i_say,
inputs_show_user=i_say,
llm_kwargs=llm_kwargs,
chatbot=chatbot,
history=history,
sys_prompt="总结文章。"
) # 带超时倒计时
inputs=i_say, inputs_show_user='开始最终总结',
llm_kwargs=llm_kwargs, chatbot=chatbot, history=final_results,
sys_prompt= f"Extract the main idea of this paper with less than {NUM_OF_WORD} Chinese characters"
)
final_results.append(gpt_say)
file_write_buffer.extend([i_say, gpt_say])
############################## <第 4 步,设置一个token上限> ##################################
_, final_results = input_clipping("", final_results, max_token_limit=3200)
yield from update_ui(chatbot=chatbot, history=final_results) # 注意这里的历史记录被替代了
chatbot[-1] = (i_say, gpt_say)
history.append(i_say); history.append(gpt_say)
yield from update_ui(chatbot=chatbot, history=history, msg=msg) # 刷新界面
res = write_results_to_file(history)
chatbot.append(("完成了吗?", res))
yield from update_ui(chatbot=chatbot, history=history, msg=msg) # 刷新界面
res = write_history_to_file(file_write_buffer)
promote_file_to_downloadzone(res, chatbot=chatbot)
yield from update_ui(chatbot=chatbot, history=final_results) # 刷新界面
@CatchException
@@ -151,10 +138,7 @@ def 批量总结PDF文档(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst
return
# 搜索需要处理的文件清单
file_manifest = [f for f in glob.glob(f'{project_folder}/**/*.pdf', recursive=True)] # + \
# [f for f in glob.glob(f'{project_folder}/**/*.tex', recursive=True)] + \
# [f for f in glob.glob(f'{project_folder}/**/*.cpp', recursive=True)] + \
# [f for f in glob.glob(f'{project_folder}/**/*.c', recursive=True)]
file_manifest = [f for f in glob.glob(f'{project_folder}/**/*.pdf', recursive=True)]
# 如果没找到任何文件
if len(file_manifest) == 0:

查看文件

@@ -1,6 +1,7 @@
from toolbox import update_ui
from toolbox import CatchException, report_execption, write_results_to_file
from toolbox import CatchException, report_execption
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from toolbox import write_history_to_file, promote_file_to_downloadzone
fast_debug = False
@@ -115,7 +116,8 @@ def 解析Paper(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbo
chatbot[-1] = (i_say, gpt_say)
history.append(i_say); history.append(gpt_say)
yield from update_ui(chatbot=chatbot, history=history, msg=msg) # 刷新界面
res = write_results_to_file(history)
res = write_history_to_file(history)
promote_file_to_downloadzone(res, chatbot=chatbot)
chatbot.append(("完成了吗?", res))
yield from update_ui(chatbot=chatbot, history=history, msg=msg) # 刷新界面

查看文件

@@ -0,0 +1,115 @@
from toolbox import CatchException, report_execption, get_log_folder, gen_time_str
from toolbox import update_ui, promote_file_to_downloadzone, update_ui_lastest_msg, disable_auto_promotion
from toolbox import write_history_to_file, promote_file_to_downloadzone
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
from .crazy_utils import read_and_clean_pdf_text
from .pdf_fns.parse_pdf import parse_pdf, get_avail_grobid_url, translate_pdf
from colorful import *
import copy
import os
import math
import logging
def markdown_to_dict(article_content):
import markdown
from bs4 import BeautifulSoup
cur_t = ""
cur_c = ""
results = {}
for line in article_content:
if line.startswith('#'):
if cur_t!="":
if cur_t not in results:
results.update({cur_t:cur_c.lstrip('\n')})
else:
# 处理重名的章节
results.update({cur_t + " " + gen_time_str():cur_c.lstrip('\n')})
cur_t = line.rstrip('\n')
cur_c = ""
else:
cur_c += line
results_final = {}
for k in list(results.keys()):
if k.startswith('# '):
results_final['title'] = k.split('# ')[-1]
results_final['authors'] = results.pop(k).lstrip('\n')
if k.startswith('###### Abstract'):
results_final['abstract'] = results.pop(k).lstrip('\n')
results_final_sections = []
for k,v in results.items():
results_final_sections.append({
'heading':k.lstrip("# "),
'text':v if len(v) > 0 else f"The beginning of {k.lstrip('# ')} section."
})
results_final['sections'] = results_final_sections
return results_final
@CatchException
def 批量翻译PDF文档(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
disable_auto_promotion(chatbot)
# 基本信息:功能、贡献者
chatbot.append([
"函数插件功能?",
"批量翻译PDF文档。函数插件贡献者: Binary-Husky"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# 尝试导入依赖,如果缺少依赖,则给出安装建议
try:
import nougat
import tiktoken
except:
report_execption(chatbot, history,
a=f"解析项目: {txt}",
b=f"导入软件依赖失败。使用该模块需要额外依赖,安装方法```pip install --upgrade nougat-ocr tiktoken```。")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# 清空历史,以免输入溢出
history = []
from .crazy_utils import get_files_from_everything
success, file_manifest, project_folder = get_files_from_everything(txt, type='.pdf')
# 检测输入参数,如没有给定输入参数,直接退出
if not success:
if txt == "": txt = '空空如也的输入栏'
# 如果没找到任何文件
if len(file_manifest) == 0:
report_execption(chatbot, history,
a=f"解析项目: {txt}", b=f"找不到任何.tex或.pdf文件: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# 开始正式执行任务
yield from 解析PDF_基于NOUGAT(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt)
def 解析PDF_基于NOUGAT(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
import copy
import tiktoken
TOKEN_LIMIT_PER_FRAGMENT = 1024
generated_conclusion_files = []
generated_html_files = []
DST_LANG = "中文"
from crazy_functions.crazy_utils import nougat_interface, construct_html
nougat_handle = nougat_interface()
for index, fp in enumerate(file_manifest):
chatbot.append(["当前进度:", f"正在解析论文,请稍候。第一次运行时,需要花费较长时间下载NOUGAT参数"]); yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
fpp = yield from nougat_handle.NOUGAT_parse_pdf(fp, chatbot, history)
promote_file_to_downloadzone(fpp, rename_file=os.path.basename(fpp)+'.nougat.mmd', chatbot=chatbot)
with open(fpp, 'r', encoding='utf8') as f:
article_content = f.readlines()
article_dict = markdown_to_dict(article_content)
logging.info(article_dict)
yield from translate_pdf(article_dict, llm_kwargs, chatbot, fp, generated_conclusion_files, TOKEN_LIMIT_PER_FRAGMENT, DST_LANG)
chatbot.append(("给出输出文件清单", str(generated_conclusion_files + generated_html_files)))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面

查看文件

@@ -1,15 +1,19 @@
from toolbox import CatchException, report_execption, write_results_to_file
from toolbox import update_ui
from toolbox import CatchException, report_execption, get_log_folder, gen_time_str
from toolbox import update_ui, promote_file_to_downloadzone, update_ui_lastest_msg, disable_auto_promotion
from toolbox import write_history_to_file, promote_file_to_downloadzone
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
from .crazy_utils import read_and_clean_pdf_text
from .pdf_fns.parse_pdf import parse_pdf, get_avail_grobid_url, translate_pdf
from colorful import *
import copy
import os
import math
@CatchException
def 批量翻译PDF文档(txt, llm_kwargs, plugin_kwargs, chatbot, history, sys_prompt, web_port):
import glob
import os
def 批量翻译PDF文档(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
disable_auto_promotion(chatbot)
# 基本信息:功能、贡献者
chatbot.append([
"函数插件功能?",
@@ -20,30 +24,22 @@ def 批量翻译PDF文档(txt, llm_kwargs, plugin_kwargs, chatbot, history, sys_
try:
import fitz
import tiktoken
import scipdf
except:
report_execption(chatbot, history,
a=f"解析项目: {txt}",
b=f"导入软件依赖失败。使用该模块需要额外依赖,安装方法```pip install --upgrade pymupdf tiktoken```。")
b=f"导入软件依赖失败。使用该模块需要额外依赖,安装方法```pip install --upgrade pymupdf tiktoken scipdf_parser```。")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# 清空历史,以免输入溢出
history = []
from .crazy_utils import get_files_from_everything
success, file_manifest, project_folder = get_files_from_everything(txt, type='.pdf')
# 检测输入参数,如没有给定输入参数,直接退出
if os.path.exists(txt):
project_folder = txt
else:
if txt == "":
txt = '空空如也的输入栏'
report_execption(chatbot, history,
a=f"解析项目: {txt}", b=f"找不到本地项目或无权访问: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
# 搜索需要处理的文件清单
file_manifest = [f for f in glob.glob(
f'{project_folder}/**/*.pdf', recursive=True)]
if not success:
if txt == "": txt = '空空如也的输入栏'
# 如果没找到任何文件
if len(file_manifest) == 0:
@@ -53,18 +49,49 @@ def 批量翻译PDF文档(txt, llm_kwargs, plugin_kwargs, chatbot, history, sys_
return
# 开始正式执行任务
yield from 解析PDF(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, sys_prompt)
grobid_url = get_avail_grobid_url()
if grobid_url is not None:
yield from 解析PDF_基于GROBID(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, grobid_url)
else:
yield from update_ui_lastest_msg("GROBID服务不可用,请检查config中的GROBID_URL。作为替代,现在将执行效果稍差的旧版代码。", chatbot, history, delay=3)
yield from 解析PDF(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt)
def 解析PDF(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, sys_prompt):
import os
import tiktoken
TOKEN_LIMIT_PER_FRAGMENT = 1280
def 解析PDF_基于GROBID(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, grobid_url):
import copy, json
TOKEN_LIMIT_PER_FRAGMENT = 1024
generated_conclusion_files = []
generated_html_files = []
DST_LANG = "中文"
from crazy_functions.crazy_utils import construct_html
for index, fp in enumerate(file_manifest):
chatbot.append(["当前进度:", f"正在连接GROBID服务,请稍候: {grobid_url}\n如果等待时间过长,请修改config中的GROBID_URL,可修改成本地GROBID服务。"]); yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
article_dict = parse_pdf(fp, grobid_url)
grobid_json_res = os.path.join(get_log_folder(), gen_time_str() + "grobid.json")
with open(grobid_json_res, 'w+', encoding='utf8') as f:
f.write(json.dumps(article_dict, indent=4, ensure_ascii=False))
promote_file_to_downloadzone(grobid_json_res, chatbot=chatbot)
if article_dict is None: raise RuntimeError("解析PDF失败,请检查PDF是否损坏。")
yield from translate_pdf(article_dict, llm_kwargs, chatbot, fp, generated_conclusion_files, TOKEN_LIMIT_PER_FRAGMENT, DST_LANG)
chatbot.append(("给出输出文件清单", str(generated_conclusion_files + generated_html_files)))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
def 解析PDF(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
"""
此函数已经弃用
"""
import copy
TOKEN_LIMIT_PER_FRAGMENT = 1024
generated_conclusion_files = []
generated_html_files = []
from crazy_functions.crazy_utils import construct_html
for index, fp in enumerate(file_manifest):
# 读取PDF文件
file_content, page_one = read_and_clean_pdf_text(fp)
file_content = file_content.encode('utf-8', 'ignore').decode() # avoid reading non-utf8 chars
page_one = str(page_one).encode('utf-8', 'ignore').decode() # avoid reading non-utf8 chars
# 递归地切割PDF文件
from .crazy_utils import breakdown_txt_to_satisfy_token_limit_for_pdf
@@ -74,7 +101,7 @@ def 解析PDF(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot,
paper_fragments = breakdown_txt_to_satisfy_token_limit_for_pdf(
txt=file_content, get_token_fn=get_token_num, limit=TOKEN_LIMIT_PER_FRAGMENT)
page_one_fragments = breakdown_txt_to_satisfy_token_limit_for_pdf(
txt=str(page_one), get_token_fn=get_token_num, limit=TOKEN_LIMIT_PER_FRAGMENT//4)
txt=page_one, get_token_fn=get_token_num, limit=TOKEN_LIMIT_PER_FRAGMENT//4)
# 为了更好的效果,我们剥离Introduction之后的部分如果有
paper_meta = page_one_fragments[0].split('introduction')[0].split('Introduction')[0].split('INTRODUCTION')[0]
@@ -100,32 +127,59 @@ def 解析PDF(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot,
"请你作为一个学术翻译,负责把学术论文准确翻译成中文。注意文章中的每一句话都要翻译。" for _ in paper_fragments],
# max_workers=5 # OpenAI所允许的最大并行过载
)
gpt_response_collection_md = copy.deepcopy(gpt_response_collection)
# 整理报告的格式
for i,k in enumerate(gpt_response_collection):
for i,k in enumerate(gpt_response_collection_md):
if i%2==0:
gpt_response_collection[i] = f"\n\n---\n\n ## 原文[{i//2}/{len(gpt_response_collection)//2}] \n\n {paper_fragments[i//2].replace('#', '')} \n\n---\n\n ## 翻译[{i//2}/{len(gpt_response_collection)//2}]\n "
gpt_response_collection_md[i] = f"\n\n---\n\n ## 原文[{i//2}/{len(gpt_response_collection_md)//2}] \n\n {paper_fragments[i//2].replace('#', '')} \n\n---\n\n ## 翻译[{i//2}/{len(gpt_response_collection_md)//2}]\n "
else:
gpt_response_collection[i] = gpt_response_collection[i]
gpt_response_collection_md[i] = gpt_response_collection_md[i]
final = ["一、论文概况\n\n---\n\n", paper_meta_info.replace('# ', '### ') + '\n\n---\n\n', "二、论文翻译", ""]
final.extend(gpt_response_collection)
final.extend(gpt_response_collection_md)
create_report_file_name = f"{os.path.basename(fp)}.trans.md"
res = write_results_to_file(final, file_name=create_report_file_name)
res = write_history_to_file(final, create_report_file_name)
promote_file_to_downloadzone(res, chatbot=chatbot)
# 更新UI
generated_conclusion_files.append(f'./gpt_log/{create_report_file_name}')
generated_conclusion_files.append(f'{get_log_folder()}/{create_report_file_name}')
chatbot.append((f"{fp}完成了吗?", res))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# write html
try:
ch = construct_html()
orig = ""
trans = ""
gpt_response_collection_html = copy.deepcopy(gpt_response_collection)
for i,k in enumerate(gpt_response_collection_html):
if i%2==0:
gpt_response_collection_html[i] = paper_fragments[i//2].replace('#', '')
else:
gpt_response_collection_html[i] = gpt_response_collection_html[i]
final = ["论文概况", paper_meta_info.replace('# ', '### '), "二、论文翻译", ""]
final.extend(gpt_response_collection_html)
for i, k in enumerate(final):
if i%2==0:
orig = k
if i%2==1:
trans = k
ch.add_row(a=orig, b=trans)
create_report_file_name = f"{os.path.basename(fp)}.trans.html"
generated_html_files.append(ch.save_file(create_report_file_name))
except:
from toolbox import trimmed_format_exc
print('writing html result failed:', trimmed_format_exc())
# 准备文件的下载
import shutil
for pdf_path in generated_conclusion_files:
# 重命名文件
rename_file = f'./gpt_log/总结论文-{os.path.basename(pdf_path)}'
if os.path.exists(rename_file):
os.remove(rename_file)
shutil.copyfile(pdf_path, rename_file)
if os.path.exists(pdf_path):
os.remove(pdf_path)
chatbot.append(("给出输出文件清单", str(generated_conclusion_files)))
rename_file = f'翻译-{os.path.basename(pdf_path)}'
promote_file_to_downloadzone(pdf_path, rename_file=rename_file, chatbot=chatbot)
for html_path in generated_html_files:
# 重命名文件
rename_file = f'翻译-{os.path.basename(html_path)}'
promote_file_to_downloadzone(html_path, rename_file=rename_file, chatbot=chatbot)
chatbot.append(("给出输出文件清单", str(generated_conclusion_files + generated_html_files)))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面

查看文件

@@ -0,0 +1,187 @@
from toolbox import CatchException, update_ui, gen_time_str
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from .crazy_utils import input_clipping
def inspect_dependency(chatbot, history):
# 尝试导入依赖,如果缺少依赖,则给出安装建议
try:
import manim
return True
except:
chatbot.append(["导入依赖失败", "使用该模块需要额外依赖,安装方法:```pip install manim manimgl```"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return False
def eval_manim(code):
import subprocess, sys, os, shutil
with open('gpt_log/MyAnimation.py', 'w', encoding='utf8') as f:
f.write(code)
def get_class_name(class_string):
import re
# Use regex to extract the class name
class_name = re.search(r'class (\w+)\(', class_string).group(1)
return class_name
class_name = get_class_name(code)
try:
subprocess.check_output([sys.executable, '-c', f"from gpt_log.MyAnimation import {class_name}; {class_name}().render()"])
shutil.move('media/videos/1080p60/{class_name}.mp4', f'gpt_log/{class_name}-{gen_time_str()}.mp4')
return f'gpt_log/{gen_time_str()}.mp4'
except subprocess.CalledProcessError as e:
output = e.output.decode()
print(f"Command returned non-zero exit status {e.returncode}: {output}.")
return f"Evaluating python script failed: {e.output}."
except:
print('generating mp4 failed')
return "Generating mp4 failed."
def get_code_block(reply):
import re
pattern = r"```([\s\S]*?)```" # regex pattern to match code blocks
matches = re.findall(pattern, reply) # find all code blocks in text
if len(matches) != 1:
raise RuntimeError("GPT is not generating proper code.")
return matches[0].strip('python') # code block
@CatchException
def 动画生成(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
"""
txt 输入栏用户输入的文本,例如需要翻译的一段话,再例如一个包含了待处理文件的路径
llm_kwargs gpt模型参数,如温度和top_p等,一般原样传递下去就行
plugin_kwargs 插件模型的参数,暂时没有用武之地
chatbot 聊天显示框的句柄,用于显示给用户
history 聊天历史,前情提要
system_prompt 给gpt的静默提醒
web_port 当前软件运行的端口号
"""
# 清空历史,以免输入溢出
history = []
# 基本信息:功能、贡献者
chatbot.append([
"函数插件功能?",
"生成数学动画, 此插件处于开发阶段, 建议暂时不要使用, 作者: binary-husky, 插件初始化中 ..."
])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# 尝试导入依赖, 如果缺少依赖, 则给出安装建议
dep_ok = yield from inspect_dependency(chatbot=chatbot, history=history) # 刷新界面
if not dep_ok: return
# 输入
i_say = f'Generate a animation to show: ' + txt
demo = ["Here is some examples of manim", examples_of_manim()]
_, demo = input_clipping(inputs="", history=demo, max_token_limit=2560)
# 开始
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
inputs=i_say, inputs_show_user=i_say,
llm_kwargs=llm_kwargs, chatbot=chatbot, history=demo,
sys_prompt=
r"Write a animation script with 3blue1brown's manim. "+
r"Please begin with `from manim import *`. " +
r"Answer me with a code block wrapped by ```."
)
chatbot.append(["开始生成动画", "..."])
history.extend([i_say, gpt_say])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 界面更新
# 将代码转为动画
code = get_code_block(gpt_say)
res = eval_manim(code)
chatbot.append(("生成的视频文件路径", res))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 界面更新
# 在这里放一些网上搜集的demo,辅助gpt生成代码
def examples_of_manim():
return r"""
```
class MovingGroupToDestination(Scene):
def construct(self):
group = VGroup(Dot(LEFT), Dot(ORIGIN), Dot(RIGHT, color=RED), Dot(2 * RIGHT)).scale(1.4)
dest = Dot([4, 3, 0], color=YELLOW)
self.add(group, dest)
self.play(group.animate.shift(dest.get_center() - group[2].get_center()))
self.wait(0.5)
```
```
class LatexWithMovingFramebox(Scene):
def construct(self):
text=MathTex(
"\\frac{d}{dx}f(x)g(x)=","f(x)\\frac{d}{dx}g(x)","+",
"g(x)\\frac{d}{dx}f(x)"
)
self.play(Write(text))
framebox1 = SurroundingRectangle(text[1], buff = .1)
framebox2 = SurroundingRectangle(text[3], buff = .1)
self.play(
Create(framebox1),
)
self.wait()
self.play(
ReplacementTransform(framebox1,framebox2),
)
self.wait()
```
```
class PointWithTrace(Scene):
def construct(self):
path = VMobject()
dot = Dot()
path.set_points_as_corners([dot.get_center(), dot.get_center()])
def update_path(path):
previous_path = path.copy()
previous_path.add_points_as_corners([dot.get_center()])
path.become(previous_path)
path.add_updater(update_path)
self.add(path, dot)
self.play(Rotating(dot, radians=PI, about_point=RIGHT, run_time=2))
self.wait()
self.play(dot.animate.shift(UP))
self.play(dot.animate.shift(LEFT))
self.wait()
```
```
# do not use get_graph, this funciton is deprecated
class ExampleFunctionGraph(Scene):
def construct(self):
cos_func = FunctionGraph(
lambda t: np.cos(t) + 0.5 * np.cos(7 * t) + (1 / 7) * np.cos(14 * t),
color=RED,
)
sin_func_1 = FunctionGraph(
lambda t: np.sin(t) + 0.5 * np.sin(7 * t) + (1 / 7) * np.sin(14 * t),
color=BLUE,
)
sin_func_2 = FunctionGraph(
lambda t: np.sin(t) + 0.5 * np.sin(7 * t) + (1 / 7) * np.sin(14 * t),
x_range=[-4, 4],
color=GREEN,
).move_to([0, 1, 0])
self.add(cos_func, sin_func_1, sin_func_2)
```
"""

查看文件

@@ -13,7 +13,9 @@ def 解析PDF(file_name, llm_kwargs, plugin_kwargs, chatbot, history, system_pro
# 递归地切割PDF文件,每一块尽量是完整的一个section,比如introduction,experiment等,必要时再进行切割
# 的长度必须小于 2500 个 Token
file_content, page_one = read_and_clean_pdf_text(file_name) # 尝试按照章节切割PDF
file_content = file_content.encode('utf-8', 'ignore').decode() # avoid reading non-utf8 chars
page_one = str(page_one).encode('utf-8', 'ignore').decode() # avoid reading non-utf8 chars
TOKEN_LIMIT_PER_FRAGMENT = 2500
from .crazy_utils import breakdown_txt_to_satisfy_token_limit_for_pdf

查看文件

@@ -1,5 +1,6 @@
from toolbox import update_ui
from toolbox import CatchException, report_execption, write_results_to_file
from toolbox import CatchException, report_execption
from toolbox import write_history_to_file, promote_file_to_downloadzone
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
fast_debug = False
@@ -27,7 +28,8 @@ def 生成函数注释(file_manifest, project_folder, llm_kwargs, plugin_kwargs,
if not fast_debug: time.sleep(2)
if not fast_debug:
res = write_results_to_file(history)
res = write_history_to_file(history)
promote_file_to_downloadzone(res, chatbot=chatbot)
chatbot.append(("完成了吗?", res))
yield from update_ui(chatbot=chatbot, history=history, msg=msg) # 刷新界面

查看文件

@@ -75,7 +75,11 @@ def 连接网络回答问题(txt, llm_kwargs, plugin_kwargs, chatbot, history, s
proxies, = get_conf('proxies')
urls = google(txt, proxies)
history = []
if len(urls) == 0:
chatbot.append((f"结论:{txt}",
"[Local Message] 受到google限制,无法从google获取信息"))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 由于请求gpt需要一段时间,我们先及时地做一次界面更新
return
# ------------- < 第2步依次访问网页 > -------------
max_search_result = 5 # 最多收纳多少个网页的结果
for index, url in enumerate(urls[:max_search_result]):

查看文件

@@ -0,0 +1,106 @@
from toolbox import CatchException, update_ui
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive, input_clipping
import requests
from bs4 import BeautifulSoup
from request_llm.bridge_all import model_info
def bing_search(query, proxies=None):
query = query
url = f"https://cn.bing.com/search?q={query}"
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36'}
response = requests.get(url, headers=headers, proxies=proxies)
soup = BeautifulSoup(response.content, 'html.parser')
results = []
for g in soup.find_all('li', class_='b_algo'):
anchors = g.find_all('a')
if anchors:
link = anchors[0]['href']
if not link.startswith('http'):
continue
title = g.find('h2').text
item = {'title': title, 'link': link}
results.append(item)
for r in results:
print(r['link'])
return results
def scrape_text(url, proxies) -> str:
"""Scrape text from a webpage
Args:
url (str): The URL to scrape text from
Returns:
str: The scraped text
"""
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36',
'Content-Type': 'text/plain',
}
try:
response = requests.get(url, headers=headers, proxies=proxies, timeout=8)
if response.encoding == "ISO-8859-1": response.encoding = response.apparent_encoding
except:
return "无法连接到该网页"
soup = BeautifulSoup(response.text, "html.parser")
for script in soup(["script", "style"]):
script.extract()
text = soup.get_text()
lines = (line.strip() for line in text.splitlines())
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
text = "\n".join(chunk for chunk in chunks if chunk)
return text
@CatchException
def 连接bing搜索回答问题(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
"""
txt 输入栏用户输入的文本,例如需要翻译的一段话,再例如一个包含了待处理文件的路径
llm_kwargs gpt模型参数,如温度和top_p等,一般原样传递下去就行
plugin_kwargs 插件模型的参数,暂时没有用武之地
chatbot 聊天显示框的句柄,用于显示给用户
history 聊天历史,前情提要
system_prompt 给gpt的静默提醒
web_port 当前软件运行的端口号
"""
history = [] # 清空历史,以免输入溢出
chatbot.append((f"请结合互联网信息回答以下问题:{txt}",
"[Local Message] 请注意,您正在调用一个[函数插件]的模板,该模板可以实现ChatGPT联网信息综合。该函数面向希望实现更多有趣功能的开发者,它可以作为创建新功能函数的模板。您若希望分享新的功能模组,请不吝PR"))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 由于请求gpt需要一段时间,我们先及时地做一次界面更新
# ------------- < 第1步爬取搜索引擎的结果 > -------------
from toolbox import get_conf
proxies, = get_conf('proxies')
urls = bing_search(txt, proxies)
history = []
if len(urls) == 0:
chatbot.append((f"结论:{txt}",
"[Local Message] 受到bing限制,无法从bing获取信息"))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 由于请求gpt需要一段时间,我们先及时地做一次界面更新
return
# ------------- < 第2步依次访问网页 > -------------
max_search_result = 8 # 最多收纳多少个网页的结果
for index, url in enumerate(urls[:max_search_result]):
res = scrape_text(url['link'], proxies)
history.extend([f"{index}份搜索结果:", res])
chatbot.append([f"{index}份搜索结果:", res[:500]+"......"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 由于请求gpt需要一段时间,我们先及时地做一次界面更新
# ------------- < 第3步ChatGPT综合 > -------------
i_say = f"从以上搜索结果中抽取信息,然后回答问题:{txt}"
i_say, history = input_clipping( # 裁剪输入,从最长的条目开始裁剪,防止爆token
inputs=i_say,
history=history,
max_token_limit=model_info[llm_kwargs['llm_model']]['max_token']*3//4
)
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
inputs=i_say, inputs_show_user=i_say,
llm_kwargs=llm_kwargs, chatbot=chatbot, history=history,
sys_prompt="请从给定的若干条搜索结果中抽取信息,对最相关的两个搜索结果进行总结,然后回答问题。"
)
chatbot[-1] = (i_say, gpt_say)
history.append(i_say);history.append(gpt_say)
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 界面更新

查看文件

@@ -0,0 +1,180 @@
"""
Explanation of the Void Terminal Plugin:
Please describe in natural language what you want to do.
1. You can open the plugin's dropdown menu to explore various capabilities of this project, and then describe your needs in natural language, for example:
- "Please call the plugin to translate a PDF paper for me. I just uploaded the paper to the upload area."
- "Please use the plugin to translate a PDF paper, with the address being https://www.nature.com/articles/s41586-019-1724-z.pdf."
- "Generate an image with blooming flowers and lush green grass using the plugin."
- "Translate the README using the plugin. The GitHub URL is https://github.com/facebookresearch/co-tracker."
- "Translate an Arxiv paper for me. The Arxiv ID is 1812.10695. Remember to use the plugin and don't do it manually!"
- "I don't like the current interface color. Modify the configuration and change the theme to THEME="High-Contrast"."
- "Could you please explain the structure of the Transformer network?"
2. If you use keywords like "call the plugin xxx", "modify the configuration xxx", "please", etc., your intention can be recognized more accurately.
3. Your intention can be recognized more accurately when using powerful models like GPT4. This plugin is relatively new, so please feel free to provide feedback on GitHub.
4. Now, if you need to process a file, please upload the file (drag the file to the file upload area) or describe the path to the file.
5. If you don't need to upload a file, you can simply repeat your command again.
"""
explain_msg = """
## 虚空终端插件说明:
1. 请用**自然语言**描述您需要做什么。例如:
- 「请调用插件,为我翻译PDF论文,论文我刚刚放到上传区了」
- 「请调用插件翻译PDF论文,地址为https://openreview.net/pdf?id=rJl0r3R9KX」
- 「把Arxiv论文翻译成中文PDF,arxiv论文的ID是1812.10695,记得用插件!」
- 「生成一张图片,图中鲜花怒放,绿草如茵,用插件实现」
- 「用插件翻译README,Github网址是https://github.com/facebookresearch/co-tracker」
- 「我不喜欢当前的界面颜色,修改配置,把主题THEME更换为THEME="High-Contrast"
- 「请调用插件,解析python源代码项目,代码我刚刚打包拖到上传区了」
- 「请问Transformer网络的结构是怎样的?」
2. 您可以打开插件下拉菜单以了解本项目的各种能力。
3. 如果您使用「调用插件xxx」、「修改配置xxx」、「请问」等关键词,您的意图可以被识别的更准确。
4. 建议使用 GPT3.5 或更强的模型,弱模型可能无法理解您的想法。该插件诞生时间不长,欢迎您前往Github反馈问题。
5. 现在,如果需要处理文件,请您上传文件(将文件拖动到文件上传区),或者描述文件所在的路径。
6. 如果不需要上传文件,现在您只需要再次重复一次您的指令即可。
"""
from pydantic import BaseModel, Field
from typing import List
from toolbox import CatchException, update_ui, is_the_upload_folder
from toolbox import update_ui_lastest_msg, disable_auto_promotion
from request_llm.bridge_all import predict_no_ui_long_connection
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from crazy_functions.crazy_utils import input_clipping
from crazy_functions.json_fns.pydantic_io import GptJsonIO, JsonStringError
from crazy_functions.vt_fns.vt_state import VoidTerminalState
from crazy_functions.vt_fns.vt_modify_config import modify_configuration_hot
from crazy_functions.vt_fns.vt_modify_config import modify_configuration_reboot
from crazy_functions.vt_fns.vt_call_plugin import execute_plugin
class UserIntention(BaseModel):
user_prompt: str = Field(description="the content of user input", default="")
intention_type: str = Field(description="the type of user intention, choose from ['ModifyConfiguration', 'ExecutePlugin', 'Chat']", default="ExecutePlugin")
user_provide_file: bool = Field(description="whether the user provides a path to a file", default=False)
user_provide_url: bool = Field(description="whether the user provides a url", default=False)
def chat(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_intention):
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
inputs=txt, inputs_show_user=txt,
llm_kwargs=llm_kwargs, chatbot=chatbot, history=[],
sys_prompt=system_prompt
)
chatbot[-1] = [txt, gpt_say]
history.extend([txt, gpt_say])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
pass
explain_intention_to_user = {
'Chat': "聊天对话",
'ExecutePlugin': "调用插件",
'ModifyConfiguration': "修改配置",
}
def analyze_intention_with_simple_rules(txt):
user_intention = UserIntention()
user_intention.user_prompt = txt
is_certain = False
if '请问' in txt:
is_certain = True
user_intention.intention_type = 'Chat'
if '用插件' in txt:
is_certain = True
user_intention.intention_type = 'ExecutePlugin'
if '修改配置' in txt:
is_certain = True
user_intention.intention_type = 'ModifyConfiguration'
return is_certain, user_intention
@CatchException
def 虚空终端(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
disable_auto_promotion(chatbot=chatbot)
# 获取当前虚空终端状态
state = VoidTerminalState.get_state(chatbot)
appendix_msg = ""
# 用简单的关键词检测用户意图
is_certain, _ = analyze_intention_with_simple_rules(txt)
if is_the_upload_folder(txt):
state.set_state(chatbot=chatbot, key='has_provided_explaination', value=False)
appendix_msg = "\n\n**很好,您已经上传了文件**,现在请您描述您的需求。"
if is_certain or (state.has_provided_explaination):
# 如果意图明确,跳过提示环节
state.set_state(chatbot=chatbot, key='has_provided_explaination', value=True)
state.unlock_plugin(chatbot=chatbot)
yield from update_ui(chatbot=chatbot, history=history)
yield from 虚空终端主路由(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port)
return
else:
# 如果意图模糊,提示
state.set_state(chatbot=chatbot, key='has_provided_explaination', value=True)
state.lock_plugin(chatbot=chatbot)
chatbot.append(("虚空终端状态:", explain_msg+appendix_msg))
yield from update_ui(chatbot=chatbot, history=history)
return
def 虚空终端主路由(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
history = []
chatbot.append(("虚空终端状态: ", f"正在执行任务: {txt}"))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# ⭐ ⭐ ⭐ 分析用户意图
is_certain, user_intention = analyze_intention_with_simple_rules(txt)
if not is_certain:
yield from update_ui_lastest_msg(
lastmsg=f"正在执行任务: {txt}\n\n分析用户意图中", chatbot=chatbot, history=history, delay=0)
gpt_json_io = GptJsonIO(UserIntention)
rf_req = "\nchoose from ['ModifyConfiguration', 'ExecutePlugin', 'Chat']"
inputs = "Analyze the intention of the user according to following user input: \n\n" + \
">> " + (txt+rf_req).rstrip('\n').replace('\n','\n>> ') + '\n\n' + gpt_json_io.format_instructions
run_gpt_fn = lambda inputs, sys_prompt: predict_no_ui_long_connection(
inputs=inputs, llm_kwargs=llm_kwargs, history=[], sys_prompt=sys_prompt, observe_window=[])
analyze_res = run_gpt_fn(inputs, "")
try:
user_intention = gpt_json_io.generate_output_auto_repair(analyze_res, run_gpt_fn)
lastmsg=f"正在执行任务: {txt}\n\n用户意图理解: 意图={explain_intention_to_user[user_intention.intention_type]}",
except JsonStringError as e:
yield from update_ui_lastest_msg(
lastmsg=f"正在执行任务: {txt}\n\n用户意图理解: 失败 当前语言模型({llm_kwargs['llm_model']})不能理解您的意图", chatbot=chatbot, history=history, delay=0)
return
else:
pass
yield from update_ui_lastest_msg(
lastmsg=f"正在执行任务: {txt}\n\n用户意图理解: 意图={explain_intention_to_user[user_intention.intention_type]}",
chatbot=chatbot, history=history, delay=0)
# 用户意图: 修改本项目的配置
if user_intention.intention_type == 'ModifyConfiguration':
yield from modify_configuration_reboot(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_intention)
# 用户意图: 调度插件
if user_intention.intention_type == 'ExecutePlugin':
yield from execute_plugin(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_intention)
# 用户意图: 聊天
if user_intention.intention_type == 'Chat':
yield from chat(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, user_intention)
return

查看文件

@@ -1,5 +1,6 @@
from toolbox import update_ui
from toolbox import CatchException, report_execption, write_results_to_file
from toolbox import CatchException, report_execption
from toolbox import write_history_to_file, promote_file_to_downloadzone
fast_debug = True
@@ -67,6 +68,7 @@ def parseNotebook(filename, enable_markdown=1):
def ipynb解释(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
enable_markdown = plugin_kwargs.get("advanced_arg", "1")
try:
enable_markdown = int(enable_markdown)
@@ -109,7 +111,8 @@ def ipynb解释(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbo
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# <-------- 写入文件,退出 ---------->
res = write_results_to_file(history)
res = write_history_to_file(history)
promote_file_to_downloadzone(res, chatbot=chatbot)
chatbot.append(("完成了吗?", res))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面

查看文件

@@ -1,12 +1,14 @@
from toolbox import update_ui
from toolbox import CatchException, report_execption, write_results_to_file
from toolbox import update_ui, promote_file_to_downloadzone, disable_auto_promotion
from toolbox import CatchException, report_execption, write_history_to_file
from .crazy_utils import input_clipping
def 解析源代码新(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
import os, copy
from .crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
msg = '正常'
disable_auto_promotion(chatbot=chatbot)
summary_batch_isolation = True
inputs_array = []
inputs_show_user_array = []
history_array = []
@@ -21,7 +23,7 @@ def 解析源代码新(file_manifest, project_folder, llm_kwargs, plugin_kwargs,
file_content = f.read()
prefix = "接下来请你逐文件分析下面的工程" if index==0 else ""
i_say = prefix + f'请对下面的程序文件做一个概述文件名是{os.path.relpath(fp, project_folder)},文件代码是 ```{file_content}```'
i_say_show_user = prefix + f'[{index}/{len(file_manifest)}] 请对下面的程序文件做一个概述: {os.path.abspath(fp)}'
i_say_show_user = prefix + f'[{index}/{len(file_manifest)}] 请对下面的程序文件做一个概述: {fp}'
# 装载请求内容
inputs_array.append(i_say)
inputs_show_user_array.append(i_say_show_user)
@@ -42,7 +44,8 @@ def 解析源代码新(file_manifest, project_folder, llm_kwargs, plugin_kwargs,
# 全部文件解析完成,结果写入文件,准备对工程源代码进行汇总分析
report_part_1 = copy.deepcopy(gpt_response_collection)
history_to_return = report_part_1
res = write_results_to_file(report_part_1)
res = write_history_to_file(report_part_1)
promote_file_to_downloadzone(res, chatbot=chatbot)
chatbot.append(("完成?", "逐个文件分析已完成。" + res + "\n\n正在开始汇总。"))
yield from update_ui(chatbot=chatbot, history=history_to_return) # 刷新界面
@@ -59,10 +62,17 @@ def 解析源代码新(file_manifest, project_folder, llm_kwargs, plugin_kwargs,
# 把“请对下面的程序文件做一个概述” 替换成 精简的 "文件名:{all_file[index]}"
for index, content in enumerate(this_iteration_gpt_response_collection):
if index%2==0: this_iteration_gpt_response_collection[index] = f"{file_rel_path[index//2]}" # 只保留文件名节省token
previous_iteration_files.extend([os.path.relpath(fp, project_folder) for index, fp in enumerate(this_iteration_file_manifest)])
this_iteration_files = [os.path.relpath(fp, project_folder) for index, fp in enumerate(this_iteration_file_manifest)]
previous_iteration_files.extend(this_iteration_files)
previous_iteration_files_string = ', '.join(previous_iteration_files)
current_iteration_focus = ', '.join([os.path.relpath(fp, project_folder) for index, fp in enumerate(this_iteration_file_manifest)])
i_say = f'用一张Markdown表格简要描述以下文件的功能{previous_iteration_files_string}。根据以上分析,用一句话概括程序的整体功能。'
current_iteration_focus = ', '.join(this_iteration_files)
if summary_batch_isolation: focus = current_iteration_focus
else: focus = previous_iteration_files_string
i_say = f'用一张Markdown表格简要描述以下文件的功能{focus}。根据以上分析,用一句话概括程序的整体功能。'
if last_iteration_result != "":
sys_prompt_additional = "已知某些代码的局部作用是:" + last_iteration_result + "\n请继续分析其他源代码,从而更全面地理解项目的整体功能。"
else:
sys_prompt_additional = ""
inputs_show_user = f'根据以上分析,对程序的整体功能和构架重新做出概括,由于输入长度限制,可能需要分组处理,本组文件为 {current_iteration_focus} + 已经汇总的文件组。'
this_iteration_history = copy.deepcopy(this_iteration_gpt_response_collection)
this_iteration_history.append(last_iteration_result)
@@ -71,16 +81,26 @@ def 解析源代码新(file_manifest, project_folder, llm_kwargs, plugin_kwargs,
result = yield from request_gpt_model_in_new_thread_with_ui_alive(
inputs=inputs, inputs_show_user=inputs_show_user, llm_kwargs=llm_kwargs, chatbot=chatbot,
history=this_iteration_history_feed, # 迭代之前的分析
sys_prompt="你是一个程序架构分析师,正在分析一个项目的源代码。")
report_part_2.extend([i_say, result])
last_iteration_result = result
sys_prompt="你是一个程序架构分析师,正在分析一个项目的源代码。" + sys_prompt_additional)
summary = "请用一句话概括这些文件的整体功能"
summary_result = yield from request_gpt_model_in_new_thread_with_ui_alive(
inputs=summary,
inputs_show_user=summary,
llm_kwargs=llm_kwargs,
chatbot=chatbot,
history=[i_say, result], # 迭代之前的分析
sys_prompt="你是一个程序架构分析师,正在分析一个项目的源代码。" + sys_prompt_additional)
report_part_2.extend([i_say, result])
last_iteration_result = summary_result
file_manifest = file_manifest[batchsize:]
gpt_response_collection = gpt_response_collection[batchsize*2:]
############################## <END> ##################################
history_to_return.extend(report_part_2)
res = write_results_to_file(history_to_return)
res = write_history_to_file(history_to_return)
promote_file_to_downloadzone(res, chatbot=chatbot)
chatbot.append(("完成了吗?", res))
yield from update_ui(chatbot=chatbot, history=history_to_return) # 刷新界面
@@ -89,9 +109,8 @@ def 解析源代码新(file_manifest, project_folder, llm_kwargs, plugin_kwargs,
def 解析项目本身(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
history = [] # 清空历史,以免输入溢出
import glob
file_manifest = [f for f in glob.glob('./*.py') if ('test_project' not in f) and ('gpt_log' not in f)] + \
[f for f in glob.glob('./crazy_functions/*.py') if ('test_project' not in f) and ('gpt_log' not in f)]+ \
[f for f in glob.glob('./request_llm/*.py') if ('test_project' not in f) and ('gpt_log' not in f)]
file_manifest = [f for f in glob.glob('./*.py')] + \
[f for f in glob.glob('./*/*.py')]
project_folder = './'
if len(file_manifest) == 0:
report_execption(chatbot, history, a = f"解析项目: {txt}", b = f"找不到任何python文件: {txt}")
@@ -117,6 +136,23 @@ def 解析一个Python项目(txt, llm_kwargs, plugin_kwargs, chatbot, history, s
return
yield from 解析源代码新(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt)
@CatchException
def 解析一个Matlab项目(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
history = [] # 清空历史,以免输入溢出
import glob, os
if os.path.exists(txt):
project_folder = txt
else:
if txt == "": txt = '空空如也的输入栏'
report_execption(chatbot, history, a = f"解析Matlab项目: {txt}", b = f"找不到本地项目或无权访问: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
file_manifest = [f for f in glob.glob(f'{project_folder}/**/*.m', recursive=True)]
if len(file_manifest) == 0:
report_execption(chatbot, history, a = f"解析Matlab项目: {txt}", b = f"找不到任何`.m`源文件: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
yield from 解析源代码新(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt)
@CatchException
def 解析一个C项目的头文件(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
@@ -232,6 +268,25 @@ def 解析一个Golang项目(txt, llm_kwargs, plugin_kwargs, chatbot, history, s
return
yield from 解析源代码新(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt)
@CatchException
def 解析一个Rust项目(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
history = [] # 清空历史,以免输入溢出
import glob, os
if os.path.exists(txt):
project_folder = txt
else:
if txt == "": txt = '空空如也的输入栏'
report_execption(chatbot, history, a=f"解析项目: {txt}", b=f"找不到本地项目或无权访问: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
file_manifest = [f for f in glob.glob(f'{project_folder}/**/*.rs', recursive=True)] + \
[f for f in glob.glob(f'{project_folder}/**/*.toml', recursive=True)] + \
[f for f in glob.glob(f'{project_folder}/**/*.lock', recursive=True)]
if len(file_manifest) == 0:
report_execption(chatbot, history, a=f"解析项目: {txt}", b=f"找不到任何golang文件: {txt}")
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
yield from 解析源代码新(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt)
@CatchException
def 解析一个Lua项目(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):

查看文件

@@ -6,7 +6,7 @@ def 同时问询(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt
"""
txt 输入栏用户输入的文本,例如需要翻译的一段话,再例如一个包含了待处理文件的路径
llm_kwargs gpt模型参数,如温度和top_p等,一般原样传递下去就行
plugin_kwargs 插件模型的参数,如温度和top_p等,一般原样传递下去就行
plugin_kwargs 插件模型的参数,用于灵活调整复杂功能的各种参数
chatbot 聊天显示框的句柄,用于显示给用户
history 聊天历史,前情提要
system_prompt 给gpt的静默提醒
@@ -35,18 +35,21 @@ def 同时问询_指定模型(txt, llm_kwargs, plugin_kwargs, chatbot, history,
"""
txt 输入栏用户输入的文本,例如需要翻译的一段话,再例如一个包含了待处理文件的路径
llm_kwargs gpt模型参数,如温度和top_p等,一般原样传递下去就行
plugin_kwargs 插件模型的参数,如温度和top_p等,一般原样传递下去就行
plugin_kwargs 插件模型的参数,用于灵活调整复杂功能的各种参数
chatbot 聊天显示框的句柄,用于显示给用户
history 聊天历史,前情提要
system_prompt 给gpt的静默提醒
web_port 当前软件运行的端口号
"""
history = [] # 清空历史,以免输入溢出
chatbot.append((txt, "正在同时咨询ChatGPT和ChatGLM……"))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 由于请求gpt需要一段时间,我们先及时地做一次界面更新
if ("advanced_arg" in plugin_kwargs) and (plugin_kwargs["advanced_arg"] == ""): plugin_kwargs.pop("advanced_arg")
# llm_kwargs['llm_model'] = 'chatglm&gpt-3.5-turbo&api2d-gpt-3.5-turbo' # 支持任意数量的llm接口,用&符号分隔
llm_kwargs['llm_model'] = plugin_kwargs.get("advanced_arg", 'chatglm&gpt-3.5-turbo') # 'chatglm&gpt-3.5-turbo' # 支持任意数量的llm接口,用&符号分隔
chatbot.append((txt, f"正在同时咨询{llm_kwargs['llm_model']}"))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 由于请求gpt需要一段时间,我们先及时地做一次界面更新
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
inputs=txt, inputs_show_user=txt,
llm_kwargs=llm_kwargs, chatbot=chatbot, history=history,

查看文件

@@ -0,0 +1,199 @@
from toolbox import update_ui
from toolbox import CatchException, get_conf, markdown_convertion
from crazy_functions.crazy_utils import input_clipping
from request_llm.bridge_all import predict_no_ui_long_connection
import threading, time
import numpy as np
from .live_audio.aliyunASR import AliyunASR
import json
class WatchDog():
def __init__(self, timeout, bark_fn, interval=3, msg="") -> None:
self.last_feed = None
self.timeout = timeout
self.bark_fn = bark_fn
self.interval = interval
self.msg = msg
self.kill_dog = False
def watch(self):
while True:
if self.kill_dog: break
if time.time() - self.last_feed > self.timeout:
if len(self.msg) > 0: print(self.msg)
self.bark_fn()
break
time.sleep(self.interval)
def begin_watch(self):
self.last_feed = time.time()
th = threading.Thread(target=self.watch)
th.daemon = True
th.start()
def feed(self):
self.last_feed = time.time()
def chatbot2history(chatbot):
history = []
for c in chatbot:
for q in c:
if q not in ["[请讲话]", "[等待GPT响应]", "[正在等您说完问题]"]:
history.append(q.strip('<div class="markdown-body">').strip('</div>').strip('<p>').strip('</p>'))
return history
class AsyncGptTask():
def __init__(self) -> None:
self.observe_future = []
self.observe_future_chatbot_index = []
def gpt_thread_worker(self, i_say, llm_kwargs, history, sys_prompt, observe_window, index):
try:
MAX_TOKEN_ALLO = 2560
i_say, history = input_clipping(i_say, history, max_token_limit=MAX_TOKEN_ALLO)
gpt_say_partial = predict_no_ui_long_connection(inputs=i_say, llm_kwargs=llm_kwargs, history=history, sys_prompt=sys_prompt,
observe_window=observe_window[index], console_slience=True)
except ConnectionAbortedError as token_exceed_err:
print('至少一个线程任务Token溢出而失败', e)
except Exception as e:
print('至少一个线程任务意外失败', e)
def add_async_gpt_task(self, i_say, chatbot_index, llm_kwargs, history, system_prompt):
self.observe_future.append([""])
self.observe_future_chatbot_index.append(chatbot_index)
cur_index = len(self.observe_future)-1
th_new = threading.Thread(target=self.gpt_thread_worker, args=(i_say, llm_kwargs, history, system_prompt, self.observe_future, cur_index))
th_new.daemon = True
th_new.start()
def update_chatbot(self, chatbot):
for of, ofci in zip(self.observe_future, self.observe_future_chatbot_index):
try:
chatbot[ofci] = list(chatbot[ofci])
chatbot[ofci][1] = markdown_convertion(of[0])
except:
self.observe_future = []
self.observe_future_chatbot_index = []
return chatbot
class InterviewAssistant(AliyunASR):
def __init__(self):
self.capture_interval = 0.5 # second
self.stop = False
self.parsed_text = "" # 下个句子中已经说完的部分, 由 test_on_result_chg() 写入
self.parsed_sentence = "" # 某段话的整个句子,由 test_on_sentence_end() 写入
self.buffered_sentence = "" #
self.event_on_result_chg = threading.Event()
self.event_on_entence_end = threading.Event()
self.event_on_commit_question = threading.Event()
def __del__(self):
self.stop = True
self.stop_msg = ""
self.commit_wd.kill_dog = True
self.plugin_wd.kill_dog = True
def init(self, chatbot):
# 初始化音频采集线程
self.captured_audio = np.array([])
self.keep_latest_n_second = 10
self.commit_after_pause_n_second = 2.0
self.ready_audio_flagment = None
self.stop = False
self.plugin_wd = WatchDog(timeout=5, bark_fn=self.__del__, msg="程序终止")
self.aut = threading.Thread(target=self.audio_convertion_thread, args=(chatbot._cookies['uuid'],))
self.aut.daemon = True
self.aut.start()
# th2 = threading.Thread(target=self.audio2txt_thread, args=(chatbot._cookies['uuid'],))
# th2.daemon = True
# th2.start()
def no_audio_for_a_while(self):
if len(self.buffered_sentence) < 7: # 如果一句话小于7个字,暂不提交
self.commit_wd.begin_watch()
else:
self.event_on_commit_question.set()
def begin(self, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
# main plugin function
self.init(chatbot)
chatbot.append(["[请讲话]", "[正在等您说完问题]"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
self.plugin_wd.begin_watch()
self.agt = AsyncGptTask()
self.commit_wd = WatchDog(timeout=self.commit_after_pause_n_second, bark_fn=self.no_audio_for_a_while, interval=0.2)
self.commit_wd.begin_watch()
while not self.stop:
self.event_on_result_chg.wait(timeout=0.25) # run once every 0.25 second
chatbot = self.agt.update_chatbot(chatbot) # 将子线程的gpt结果写入chatbot
history = chatbot2history(chatbot)
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
self.plugin_wd.feed()
if self.event_on_result_chg.is_set():
# called when some words have finished
self.event_on_result_chg.clear()
chatbot[-1] = list(chatbot[-1])
chatbot[-1][0] = self.buffered_sentence + self.parsed_text
history = chatbot2history(chatbot)
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
self.commit_wd.feed()
if self.event_on_entence_end.is_set():
# called when a sentence has ended
self.event_on_entence_end.clear()
self.parsed_text = self.parsed_sentence
self.buffered_sentence += self.parsed_text
chatbot[-1] = list(chatbot[-1])
chatbot[-1][0] = self.buffered_sentence
history = chatbot2history(chatbot)
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
if self.event_on_commit_question.is_set():
# called when a question should be commited
self.event_on_commit_question.clear()
if len(self.buffered_sentence) == 0: raise RuntimeError
self.commit_wd.begin_watch()
chatbot[-1] = list(chatbot[-1])
chatbot[-1] = [self.buffered_sentence, "[等待GPT响应]"]
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# add gpt task 创建子线程请求gpt,避免线程阻塞
history = chatbot2history(chatbot)
self.agt.add_async_gpt_task(self.buffered_sentence, len(chatbot)-1, llm_kwargs, history, system_prompt)
self.buffered_sentence = ""
chatbot.append(["[请讲话]", "[正在等您说完问题]"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
if len(self.stop_msg) != 0:
raise RuntimeError(self.stop_msg)
@CatchException
def 语音助手(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
# pip install -U openai-whisper
chatbot.append(["对话助手函数插件:使用时,双手离开鼠标键盘吧", "音频助手, 正在听您讲话(点击“停止”键可终止程序)..."])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
# 尝试导入依赖,如果缺少依赖,则给出安装建议
try:
import nls
from scipy import io
except:
chatbot.append(["导入依赖失败", "使用该模块需要额外依赖, 安装方法:```pip install --upgrade aliyun-python-sdk-core==2.13.3 pyOpenSSL scipy git+https://github.com/aliyun/alibabacloud-nls-python-sdk.git```"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
APPKEY = get_conf('ALIYUN_APPKEY')
if APPKEY == "":
chatbot.append(["导入依赖失败", "没有阿里云语音识别APPKEY和TOKEN, 详情见https://help.aliyun.com/document_detail/450255.html"])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
return
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
ia = InterviewAssistant()
yield from ia.begin(llm_kwargs, plugin_kwargs, chatbot, history, system_prompt)

查看文件

@@ -1,7 +1,7 @@
from toolbox import update_ui
from toolbox import CatchException, report_execption, write_results_to_file
from toolbox import CatchException, report_execption
from toolbox import write_history_to_file, promote_file_to_downloadzone
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
fast_debug = False
def 解析Paper(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt):
@@ -17,32 +17,29 @@ def 解析Paper(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbo
chatbot.append((i_say_show_user, "[Local Message] waiting gpt response."))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
if not fast_debug:
msg = '正常'
# ** gpt request **
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(i_say, i_say_show_user, llm_kwargs, chatbot, history=[], sys_prompt=system_prompt) # 带超时倒计时
chatbot[-1] = (i_say_show_user, gpt_say)
history.append(i_say_show_user); history.append(gpt_say)
yield from update_ui(chatbot=chatbot, history=history, msg=msg) # 刷新界面
if not fast_debug: time.sleep(2)
msg = '正常'
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(i_say, i_say_show_user, llm_kwargs, chatbot, history=[], sys_prompt=system_prompt) # 带超时倒计时
chatbot[-1] = (i_say_show_user, gpt_say)
history.append(i_say_show_user); history.append(gpt_say)
yield from update_ui(chatbot=chatbot, history=history, msg=msg) # 刷新界面
time.sleep(2)
all_file = ', '.join([os.path.relpath(fp, project_folder) for index, fp in enumerate(file_manifest)])
i_say = f'根据以上你自己的分析,对全文进行概括,用学术性语言写一段中文摘要,然后再写一段英文摘要(包括{all_file})。'
chatbot.append((i_say, "[Local Message] waiting gpt response."))
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
if not fast_debug:
msg = '正常'
# ** gpt request **
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(i_say, i_say, llm_kwargs, chatbot, history=history, sys_prompt=system_prompt) # 带超时倒计时
msg = '正常'
# ** gpt request **
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(i_say, i_say, llm_kwargs, chatbot, history=history, sys_prompt=system_prompt) # 带超时倒计时
chatbot[-1] = (i_say, gpt_say)
history.append(i_say); history.append(gpt_say)
yield from update_ui(chatbot=chatbot, history=history, msg=msg) # 刷新界面
res = write_results_to_file(history)
chatbot.append(("完成了吗?", res))
yield from update_ui(chatbot=chatbot, history=history, msg=msg) # 刷新界面
chatbot[-1] = (i_say, gpt_say)
history.append(i_say); history.append(gpt_say)
yield from update_ui(chatbot=chatbot, history=history, msg=msg) # 刷新界面
res = write_history_to_file(history)
promote_file_to_downloadzone(res, chatbot=chatbot)
chatbot.append(("完成了吗?", res))
yield from update_ui(chatbot=chatbot, history=history, msg=msg) # 刷新界面

查看文件

@@ -1,26 +1,75 @@
from .crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
from toolbox import CatchException, report_execption, write_results_to_file
from toolbox import update_ui
from toolbox import CatchException, report_execption, promote_file_to_downloadzone
from toolbox import update_ui, update_ui_lastest_msg, disable_auto_promotion, write_history_to_file
import logging
import requests
import time
import random
ENABLE_ALL_VERSION_SEARCH = True
def get_meta_information(url, chatbot, history):
import requests
import arxiv
import difflib
import re
from bs4 import BeautifulSoup
from toolbox import get_conf
from urllib.parse import urlparse
session = requests.session()
proxies, = get_conf('proxies')
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7',
'Cache-Control':'max-age=0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
'Connection': 'keep-alive'
}
# 发送 GET 请求
response = requests.get(url, proxies=proxies, headers=headers)
session.proxies.update(proxies)
session.headers.update(headers)
response = session.get(url)
# 解析网页内容
soup = BeautifulSoup(response.text, "html.parser")
def string_similar(s1, s2):
return difflib.SequenceMatcher(None, s1, s2).quick_ratio()
if ENABLE_ALL_VERSION_SEARCH:
def search_all_version(url):
time.sleep(random.randint(1,5)) # 睡一会防止触发google反爬虫
response = session.get(url)
soup = BeautifulSoup(response.text, "html.parser")
for result in soup.select(".gs_ri"):
try:
url = result.select_one(".gs_rt").a['href']
except:
continue
arxiv_id = extract_arxiv_id(url)
if not arxiv_id:
continue
search = arxiv.Search(
id_list=[arxiv_id],
max_results=1,
sort_by=arxiv.SortCriterion.Relevance,
)
try: paper = next(search.results())
except: paper = None
return paper
return None
def extract_arxiv_id(url):
# 返回给定的url解析出的arxiv_id,如url未成功匹配返回None
pattern = r'arxiv.org/abs/([^/]+)'
match = re.search(pattern, url)
if match:
return match.group(1)
else:
return None
profile = []
# 获取所有文章的标题和作者
for result in soup.select(".gs_ri"):
@@ -31,28 +80,45 @@ def get_meta_information(url, chatbot, history):
except:
citation = 'cited by 0'
abstract = result.select_one(".gs_rs").text.strip() # 摘要在 .gs_rs 中的文本,需要清除首尾空格
# 首先在arxiv上搜索,获取文章摘要
search = arxiv.Search(
query = title,
max_results = 1,
sort_by = arxiv.SortCriterion.Relevance,
)
paper = next(search.results())
if string_similar(title, paper.title) > 0.90: # same paper
try: paper = next(search.results())
except: paper = None
is_match = paper is not None and string_similar(title, paper.title) > 0.90
# 如果在Arxiv上匹配失败,检索文章的历史版本的题目
if not is_match and ENABLE_ALL_VERSION_SEARCH:
other_versions_page_url = [tag['href'] for tag in result.select_one('.gs_flb').select('.gs_nph') if 'cluster' in tag['href']]
if len(other_versions_page_url) > 0:
other_versions_page_url = other_versions_page_url[0]
paper = search_all_version('http://' + urlparse(url).netloc + other_versions_page_url)
is_match = paper is not None and string_similar(title, paper.title) > 0.90
if is_match:
# same paper
abstract = paper.summary.replace('\n', ' ')
is_paper_in_arxiv = True
else: # different paper
else:
# different paper
abstract = abstract
is_paper_in_arxiv = False
paper = next(search.results())
print(title)
print(author)
print(citation)
logging.info('[title]:' + title)
logging.info('[author]:' + author)
logging.info('[citation]:' + citation)
profile.append({
'title':title,
'author':author,
'citation':citation,
'abstract':abstract,
'is_paper_in_arxiv':is_paper_in_arxiv,
'title': title,
'author': author,
'citation': citation,
'abstract': abstract,
'is_paper_in_arxiv': is_paper_in_arxiv,
})
chatbot[-1] = [chatbot[-1][0], title + f'\n\n是否在arxiv中不在arxiv中无法获取完整摘要:{is_paper_in_arxiv}\n\n' + abstract]
@@ -61,6 +127,7 @@ def get_meta_information(url, chatbot, history):
@CatchException
def 谷歌检索小助手(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
disable_auto_promotion(chatbot=chatbot)
# 基本信息:功能、贡献者
chatbot.append([
"函数插件功能?",
@@ -82,6 +149,9 @@ def 谷歌检索小助手(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst
# 清空历史,以免输入溢出
history = []
meta_paper_info_list = yield from get_meta_information(txt, chatbot, history)
if len(meta_paper_info_list) == 0:
yield from update_ui_lastest_msg(lastmsg='获取文献失败,可能触发了google反爬虫机制。',chatbot=chatbot, history=history, delay=0)
return
batchsize = 5
for batch in range(math.ceil(len(meta_paper_info_list)/batchsize)):
if len(meta_paper_info_list[:batchsize]) > 0:
@@ -103,6 +173,7 @@ def 谷歌检索小助手(txt, llm_kwargs, plugin_kwargs, chatbot, history, syst
"已经全部完成,您可以试试让AI写一个Related Works,例如您可以继续输入Write a \"Related Works\" section about \"你搜索的研究领域\" for me."])
msg = '正常'
yield from update_ui(chatbot=chatbot, history=history, msg=msg) # 刷新界面
res = write_results_to_file(history)
chatbot.append(("完成了吗?", res));
path = write_history_to_file(history)
promote_file_to_downloadzone(path, chatbot=chatbot)
chatbot.append(("完成了吗?", path));
yield from update_ui(chatbot=chatbot, history=history, msg=msg) # 刷新界面

查看文件

@@ -0,0 +1,42 @@
# encoding: utf-8
# @Time : 2023/4/19
# @Author : Spike
# @Descr :
from toolbox import update_ui, get_conf
from toolbox import CatchException
from crazy_functions.crazy_utils import request_gpt_model_in_new_thread_with_ui_alive
@CatchException
def 猜你想问(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
if txt:
show_say = txt
prompt = txt+'\n回答完问题后,再列出用户可能提出的三个问题。'
else:
prompt = history[-1]+"\n分析上述回答,再列出用户可能提出的三个问题。"
show_say = '分析上述回答,再列出用户可能提出的三个问题。'
gpt_say = yield from request_gpt_model_in_new_thread_with_ui_alive(
inputs=prompt,
inputs_show_user=show_say,
llm_kwargs=llm_kwargs,
chatbot=chatbot,
history=history,
sys_prompt=system_prompt
)
chatbot[-1] = (show_say, gpt_say)
history.extend([show_say, gpt_say])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
@CatchException
def 清除缓存(txt, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, web_port):
chatbot.append(['清除本地缓存数据', '执行中. 删除数据'])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面
import shutil, os
PATH_PRIVATE_UPLOAD, PATH_LOGGING = get_conf('PATH_PRIVATE_UPLOAD', 'PATH_LOGGING')
shutil.rmtree(PATH_LOGGING, ignore_errors=True)
shutil.rmtree(PATH_PRIVATE_UPLOAD, ignore_errors=True)
chatbot.append(['清除本地缓存数据', '执行完成'])
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面

查看文件

@@ -6,7 +6,7 @@ def 高阶功能模板函数(txt, llm_kwargs, plugin_kwargs, chatbot, history, s
"""
txt 输入栏用户输入的文本,例如需要翻译的一段话,再例如一个包含了待处理文件的路径
llm_kwargs gpt模型参数,如温度和top_p等,一般原样传递下去就行
plugin_kwargs 插件模型的参数,暂时没有用武之地
plugin_kwargs 插件模型的参数,用于灵活调整复杂功能的各种参数
chatbot 聊天显示框的句柄,用于显示给用户
history 聊天历史,前情提要
system_prompt 给gpt的静默提醒
@@ -26,4 +26,4 @@ def 高阶功能模板函数(txt, llm_kwargs, plugin_kwargs, chatbot, history, s
)
chatbot[-1] = (i_say, gpt_say)
history.append(i_say);history.append(gpt_say)
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 界面更新
yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # 界面更新

查看文件

@@ -1,55 +1,129 @@
【请在修改完参数后,删除此行】请在以下方案中选择一种,然后删除其他的方案,最后docker-compose up运行
## ===================================================
# docker-compose.yml
## ===================================================
# 1. 请在以下方案中选择任意一种,然后删除其他的方案
# 2. 修改你选择的方案中的environment环境变量,详情请见github wiki或者config.py
# 3. 选择一种暴露服务端口的方法,并对相应的配置做出修改:
# 【方法1: 适用于Linux,很方便,可惜windows不支持】与宿主的网络融合为一体,这个是默认配置
# network_mode: "host"
# 【方法2: 适用于所有系统包括Windows和MacOS】端口映射,把容器的端口映射到宿主的端口注意您需要先删除network_mode: "host",再追加以下内容)
# ports:
# - "12345:12345" # 注意12345必须与WEB_PORT环境变量相互对应
# 4. 最后`docker-compose up`运行
# 5. 如果希望使用显卡,请关注 LOCAL_MODEL_DEVICE 和 英伟达显卡运行时 选项
## ===================================================
# 1. Please choose one of the following options and delete the others.
# 2. Modify the environment variables in the selected option, see GitHub wiki or config.py for more details.
# 3. Choose a method to expose the server port and make the corresponding configuration changes:
# [Method 1: Suitable for Linux, convenient, but not supported for Windows] Fusion with the host network, this is the default configuration
# network_mode: "host"
# [Method 2: Suitable for all systems including Windows and MacOS] Port mapping, mapping the container port to the host port (note that you need to delete network_mode: "host" first, and then add the following content)
# ports:
# - "12345: 12345" # Note! 12345 must correspond to the WEB_PORT environment variable.
# 4. Finally, run `docker-compose up`.
# 5. If you want to use a graphics card, pay attention to the LOCAL_MODEL_DEVICE and Nvidia GPU runtime options.
## ===================================================
## ===================================================
## 【方案如果不需要运行本地模型仅chatgpt类远程服务
## 【方案部署项目的全部能力这个是包含cuda和latex的大型镜像。如果您网速慢、硬盘小或没有显卡,则不推荐使用这个
## ===================================================
version: '3'
services:
gpt_academic_full_capability:
image: ghcr.io/binary-husky/gpt_academic_with_all_capacity:master
environment:
# 请查阅 `config.py`或者 github wiki 以查看所有的配置信息
API_KEY: ' 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 '
# USE_PROXY: ' True '
# proxies: ' { "http": "http://localhost:10881", "https": "http://localhost:10881", } '
LLM_MODEL: ' gpt-3.5-turbo '
AVAIL_LLM_MODELS: ' ["gpt-3.5-turbo", "gpt-4", "qianfan", "sparkv2", "spark", "chatglm"] '
BAIDU_CLOUD_API_KEY : ' bTUtwEAveBrQipEowUvDwYWq '
BAIDU_CLOUD_SECRET_KEY : ' jqXtLvXiVw6UNdjliATTS61rllG8Iuni '
XFYUN_APPID: ' 53a8d816 '
XFYUN_API_SECRET: ' MjMxNDQ4NDE4MzM0OSNlNjQ2NTlhMTkx '
XFYUN_API_KEY: ' 95ccdec285364869d17b33e75ee96447 '
ENABLE_AUDIO: ' False '
DEFAULT_WORKER_NUM: ' 20 '
WEB_PORT: ' 12345 '
ADD_WAIFU: ' False '
ALIYUN_APPKEY: ' RxPlZrM88DnAFkZK '
THEME: ' Chuanhu-Small-and-Beautiful '
ALIYUN_ACCESSKEY: ' LTAI5t6BrFUzxRXVGUWnekh1 '
ALIYUN_SECRET: ' eHmI20SVWIwQZxCiTD2bGQVspP9i68 '
# LOCAL_MODEL_DEVICE: ' cuda '
# 加载英伟达显卡运行时
# runtime: nvidia
# deploy:
# resources:
# reservations:
# devices:
# - driver: nvidia
# count: 1
# capabilities: [gpu]
# 【WEB_PORT暴露方法1: 适用于Linux】与宿主的网络融合
network_mode: "host"
# 【WEB_PORT暴露方法2: 适用于所有系统】端口映射
# ports:
# - "12345:12345" # 12345必须与WEB_PORT相互对应
# 启动容器后,运行main.py主程序
command: >
bash -c "python3 -u main.py"
## ===================================================
## 【方案一】 如果不需要运行本地模型(仅 chatgpt, azure, 星火, 千帆, claude 等在线大模型服务)
## ===================================================
version: '3'
services:
gpt_academic_nolocalllms:
image: fuqingxu/gpt_academic:no-local-llms
image: ghcr.io/binary-husky/gpt_academic_nolocal:master # (Auto Built by Dockerfile: docs/GithubAction+NoLocal)
environment:
# 请查阅 `config.py` 以查看所有的配置信息
API_KEY: ' sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,fkxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx '
API_KEY: ' sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx '
USE_PROXY: ' True '
proxies: ' { "http": "socks5h://localhost:10880", "https": "socks5h://localhost:10880", } '
LLM_MODEL: ' gpt-3.5-turbo '
AVAIL_LLM_MODELS: ' ["gpt-3.5-turbo", "api2d-gpt-4"] '
DEFAULT_WORKER_NUM: ' 10 '
AVAIL_LLM_MODELS: ' ["gpt-3.5-turbo", "api2d-gpt-3.5-turbo", "gpt-4", "api2d-gpt-4", "sparkv2", "qianfan"] '
WEB_PORT: ' 22303 '
ADD_WAIFU: ' True '
AUTHENTICATION: ' [("username", "passwd"), ("username2", "passwd2")] '
# THEME: ' Chuanhu-Small-and-Beautiful '
# DEFAULT_WORKER_NUM: ' 10 '
# AUTHENTICATION: ' [("username", "passwd"), ("username2", "passwd2")] '
# 与宿主的网络融合
network_mode: "host"
# 不使用代理网络拉取最新代码
command: >
bash -c " echo '[gpt-academic] 正在从github拉取最新代码...' &&
git checkout master --force &&
git remote set-url origin https://github.com/binary-husky/chatgpt_academic.git &&
git pull &&
python3 -u main.py"
bash -c "python3 -u main.py"
### ===================================================
### 【方案二】 如果需要运行ChatGLM本地模型
### 【方案二】 如果需要运行ChatGLM + Qwen + MOSS等本地模型
### ===================================================
version: '3'
services:
gpt_academic_with_chatglm:
image: fuqingxu/gpt_academic:chatgpt-chatglm-newbing # [option 2] 如果需要运行ChatGLM本地模型
image: ghcr.io/binary-husky/gpt_academic_chatglm_moss:master # (Auto Built by Dockerfile: docs/Dockerfile+ChatGLM)
environment:
# 请查阅 `config.py` 以查看所有的配置信息
API_KEY: ' sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,fkxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx '
API_KEY: ' sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx '
USE_PROXY: ' True '
proxies: ' { "http": "socks5h://localhost:10880", "https": "socks5h://localhost:10880", } '
LLM_MODEL: ' gpt-3.5-turbo '
AVAIL_LLM_MODELS: ' ["gpt-3.5-turbo", "api2d-gpt-4", "chatglm"] '
AVAIL_LLM_MODELS: ' ["chatglm", "qwen", "moss", "gpt-3.5-turbo", "gpt-4", "newbing"] '
LOCAL_MODEL_DEVICE: ' cuda '
DEFAULT_WORKER_NUM: ' 10 '
WEB_PORT: ' 12303 '
ADD_WAIFU: ' True '
AUTHENTICATION: ' [("username", "passwd"), ("username2", "passwd2")] '
# AUTHENTICATION: ' [("username", "passwd"), ("username2", "passwd2")] '
# 显卡的使用,nvidia0指第0个GPU
runtime: nvidia
@@ -58,21 +132,12 @@ services:
# 与宿主的网络融合
network_mode: "host"
# 使用代理网络拉取最新代码
# command: >
# bash -c " echo '[gpt-academic] 正在从github拉取最新代码...' &&
# truncate -s -1 /etc/proxychains.conf &&
# echo \"socks5 127.0.0.1 10880\" >> /etc/proxychains.conf &&
# proxychains git pull &&
# python3 -u main.py "
# 不使用代理网络拉取最新代码
command: >
bash -c " echo '[gpt-academic] 正在从github拉取最新代码...' &&
git pull &&
python3 -u main.py"
bash -c "python3 -u main.py"
# P.S. 通过对 command 进行微调,可以便捷地安装额外的依赖
# command: >
# bash -c "pip install -r request_llm/requirements_qwen.txt && python3 -u main.py"
### ===================================================
### 【方案三】 如果需要运行ChatGPT + LLAMA + 盘古 + RWKV本地模型
@@ -80,14 +145,14 @@ services:
version: '3'
services:
gpt_academic_with_rwkv:
image: fuqingxu/gpt_academic:jittorllms # [option 2] 如果需要运行ChatGLM本地模型
image: ghcr.io/binary-husky/gpt_academic_jittorllms:master
environment:
# 请查阅 `config.py` 以查看所有的配置信息
API_KEY: ' sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,fkxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx '
USE_PROXY: ' True '
proxies: ' { "http": "socks5h://localhost:10880", "https": "socks5h://localhost:10880", } '
LLM_MODEL: ' gpt-3.5-turbo '
AVAIL_LLM_MODELS: ' ["gpt-3.5-turbo", "api2d-gpt-4", "jittorllms_rwkv"] '
AVAIL_LLM_MODELS: ' ["gpt-3.5-turbo", "newbing", "jittorllms_rwkv", "jittorllms_pangualpha", "jittorllms_llama"] '
LOCAL_MODEL_DEVICE: ' cuda '
DEFAULT_WORKER_NUM: ' 10 '
WEB_PORT: ' 12305 '
@@ -102,21 +167,66 @@ services:
# 与宿主的网络融合
network_mode: "host"
# 使用代理网络拉取最新代码
# command: >
# bash -c " truncate -s -1 /etc/proxychains.conf &&
# echo \"socks5 127.0.0.1 10880\" >> /etc/proxychains.conf &&
# echo '[gpt-academic] 正在从github拉取最新代码...' &&
# proxychains git pull &&
# echo '[jittorllms] 正在从github拉取最新代码...' &&
# proxychains git --git-dir=request_llm/jittorllms/.git --work-tree=request_llm/jittorllms pull --force &&
# python3 -u main.py"
# 使用代理网络拉取最新代码
command: >
python3 -u main.py
## ===================================================
## 【方案四】 ChatGPT + Latex
## ===================================================
version: '3'
services:
gpt_academic_with_latex:
image: ghcr.io/binary-husky/gpt_academic_with_latex:master # (Auto Built by Dockerfile: docs/GithubAction+NoLocal+Latex)
environment:
# 请查阅 `config.py` 以查看所有的配置信息
API_KEY: ' sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx '
USE_PROXY: ' True '
proxies: ' { "http": "socks5h://localhost:10880", "https": "socks5h://localhost:10880", } '
LLM_MODEL: ' gpt-3.5-turbo '
AVAIL_LLM_MODELS: ' ["gpt-3.5-turbo", "gpt-4"] '
LOCAL_MODEL_DEVICE: ' cuda '
DEFAULT_WORKER_NUM: ' 10 '
WEB_PORT: ' 12303 '
# 与宿主的网络融合
network_mode: "host"
# 不使用代理网络拉取最新代码
command: >
bash -c " echo '[gpt-academic] 正在从github拉取最新代码...' &&
git pull &&
echo '[jittorllms] 正在从github拉取最新代码...' &&
git --git-dir=request_llm/jittorllms/.git --work-tree=request_llm/jittorllms pull --force &&
python3 -u main.py"
bash -c "python3 -u main.py"
## ===================================================
## 【方案五】 ChatGPT + 语音助手 (请先阅读 docs/use_audio.md
## ===================================================
version: '3'
services:
gpt_academic_with_audio:
image: ghcr.io/binary-husky/gpt_academic_audio_assistant:master
environment:
# 请查阅 `config.py` 以查看所有的配置信息
API_KEY: ' fk195831-IdP0Pb3W6DCMUIbQwVX6MsSiyxwqybyS '
USE_PROXY: ' False '
proxies: ' None '
LLM_MODEL: ' gpt-3.5-turbo '
AVAIL_LLM_MODELS: ' ["gpt-3.5-turbo", "gpt-4"] '
ENABLE_AUDIO: ' True '
LOCAL_MODEL_DEVICE: ' cuda '
DEFAULT_WORKER_NUM: ' 20 '
WEB_PORT: ' 12343 '
ADD_WAIFU: ' True '
THEME: ' Chuanhu-Small-and-Beautiful '
ALIYUN_APPKEY: ' RoP1ZrM84DnAFkZK '
ALIYUN_TOKEN: ' f37f30e0f9934c34a992f6f64f7eba4f '
# (无需填写) ALIYUN_ACCESSKEY: ' LTAI5q6BrFUzoRXVGUWnekh1 '
# (无需填写) ALIYUN_SECRET: ' eHmI20AVWIaQZ0CiTD2bGQVsaP9i68 '
# 与宿主的网络融合
network_mode: "host"
# 不使用代理网络拉取最新代码
command: >
bash -c "python3 -u main.py"

查看文件

@@ -1,62 +1,2 @@
# How to build | 如何构建: docker build -t gpt-academic --network=host -f Dockerfile+ChatGLM .
# How to run | (1) 我想直接一键运行选择0号GPU: docker run --rm -it --net=host --gpus \"device=0\" gpt-academic
# How to run | (2) 我想运行之前进容器做一些调整选择1号GPU: docker run --rm -it --net=host --gpus \"device=1\" gpt-academic bash
# 从NVIDIA源,从而支持显卡运损检查宿主的nvidia-smi中的cuda版本必须>=11.3
FROM nvidia/cuda:11.3.1-runtime-ubuntu20.04
ARG useProxyNetwork=''
RUN apt-get update
RUN apt-get install -y curl proxychains curl
RUN apt-get install -y git python python3 python-dev python3-dev --fix-missing
# 此Dockerfile不再维护,请前往docs/GithubAction+ChatGLM+Moss
# 配置代理网络构建Docker镜像时使用
# # comment out below if you do not need proxy network | 如果不需要翻墙 - 从此行向下删除
RUN $useProxyNetwork curl cip.cc
RUN sed -i '$ d' /etc/proxychains.conf
RUN sed -i '$ d' /etc/proxychains.conf
# 在这里填写主机的代理协议用于从github拉取代码
RUN echo "socks5 127.0.0.1 10880" >> /etc/proxychains.conf
ARG useProxyNetwork=proxychains
# # comment out above if you do not need proxy network | 如果不需要翻墙 - 从此行向上删除
# use python3 as the system default python
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.8
# 下载pytorch
RUN $useProxyNetwork python3 -m pip install torch --extra-index-url https://download.pytorch.org/whl/cu113
# 下载分支
WORKDIR /gpt
RUN $useProxyNetwork git clone https://github.com/binary-husky/chatgpt_academic.git
WORKDIR /gpt/chatgpt_academic
RUN $useProxyNetwork python3 -m pip install -r requirements.txt
RUN $useProxyNetwork python3 -m pip install -r request_llm/requirements_chatglm.txt
RUN $useProxyNetwork python3 -m pip install -r request_llm/requirements_newbing.txt
# 预热CHATGLM参数非必要 可选步骤)
RUN echo ' \n\
from transformers import AutoModel, AutoTokenizer \n\
chatglm_tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True) \n\
chatglm_model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).float() ' >> warm_up_chatglm.py
RUN python3 -u warm_up_chatglm.py
# 禁用缓存,确保更新代码
ADD "https://www.random.org/cgi-bin/randbyte?nbytes=10&format=h" skipcache
RUN $useProxyNetwork git pull
# 预热Tiktoken模块
RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'
# 为chatgpt-academic配置代理和API-KEY (非必要 可选步骤)
# 可同时填写多个API-KEY,支持openai的key和api2d的key共存,用英文逗号分割,例如API_KEY = "sk-openaikey1,fkxxxx-api2dkey2,........"
# LLM_MODEL 是选择初始的模型
# LOCAL_MODEL_DEVICE 是选择chatglm等本地模型运行的设备,可选 cpu 和 cuda
# [说明: 以下内容与`config.py`一一对应,请查阅config.py来完成一下配置的填写]
RUN echo ' \n\
API_KEY = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,fkxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \n\
USE_PROXY = True \n\
LLM_MODEL = "chatglm" \n\
LOCAL_MODEL_DEVICE = "cuda" \n\
proxies = { "http": "socks5h://localhost:10880", "https": "socks5h://localhost:10880", } ' >> config_private.py
# 启动
CMD ["python3", "-u", "main.py"]

1
docs/Dockerfile+JittorLLM 普通文件
查看文件

@@ -0,0 +1 @@
# 此Dockerfile不再维护,请前往docs/GithubAction+JittorLLMs

查看文件

@@ -0,0 +1 @@
# 此Dockerfile不再维护,请前往docs/GithubAction+NoLocal+Latex

查看文件

@@ -0,0 +1,36 @@
# docker build -t gpt-academic-all-capacity -f docs/GithubAction+AllCapacity --network=host --build-arg http_proxy=http://localhost:10881 --build-arg https_proxy=http://localhost:10881 .
# 从NVIDIA源,从而支持显卡检查宿主的nvidia-smi中的cuda版本必须>=11.3
FROM fuqingxu/11.3.1-runtime-ubuntu20.04-with-texlive:latest
# use python3 as the system default python
WORKDIR /gpt
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.8
# 下载pytorch
RUN python3 -m pip install torch --extra-index-url https://download.pytorch.org/whl/cu113
# 准备pip依赖
RUN python3 -m pip install openai numpy arxiv rich
RUN python3 -m pip install colorama Markdown pygments pymupdf
RUN python3 -m pip install python-docx moviepy pdfminer
RUN python3 -m pip install zh_langchain==0.2.1 pypinyin
RUN python3 -m pip install rarfile py7zr
RUN python3 -m pip install aliyun-python-sdk-core==2.13.3 pyOpenSSL scipy git+https://github.com/aliyun/alibabacloud-nls-python-sdk.git
# 下载分支
WORKDIR /gpt
RUN git clone --depth=1 https://github.com/binary-husky/gpt_academic.git
WORKDIR /gpt/gpt_academic
RUN git clone --depth=1 https://github.com/OpenLMLab/MOSS.git request_llm/moss
RUN python3 -m pip install -r requirements.txt
RUN python3 -m pip install -r request_llm/requirements_moss.txt
RUN python3 -m pip install -r request_llm/requirements_qwen.txt
RUN python3 -m pip install -r request_llm/requirements_chatglm.txt
RUN python3 -m pip install -r request_llm/requirements_newbing.txt
RUN python3 -m pip install nougat-ocr
# 预热Tiktoken模块
RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'
# 启动
CMD ["python3", "-u", "main.py"]

查看文件

@@ -0,0 +1,30 @@
# 从NVIDIA源,从而支持显卡运损检查宿主的nvidia-smi中的cuda版本必须>=11.3
FROM nvidia/cuda:11.3.1-runtime-ubuntu20.04
RUN apt-get update
RUN apt-get install -y curl proxychains curl gcc
RUN apt-get install -y git python python3 python-dev python3-dev --fix-missing
# use python3 as the system default python
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.8
# 下载pytorch
RUN python3 -m pip install torch --extra-index-url https://download.pytorch.org/whl/cu113
# 下载分支
WORKDIR /gpt
RUN git clone --depth=1 https://github.com/binary-husky/gpt_academic.git
WORKDIR /gpt/gpt_academic
RUN git clone https://github.com/OpenLMLab/MOSS.git request_llm/moss
RUN python3 -m pip install -r requirements.txt
RUN python3 -m pip install -r request_llm/requirements_moss.txt
RUN python3 -m pip install -r request_llm/requirements_qwen.txt
RUN python3 -m pip install -r request_llm/requirements_chatglm.txt
RUN python3 -m pip install -r request_llm/requirements_newbing.txt
# 预热Tiktoken模块
RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'
# 启动
CMD ["python3", "-u", "main.py"]

查看文件

@@ -0,0 +1,34 @@
# 从NVIDIA源,从而支持显卡运损检查宿主的nvidia-smi中的cuda版本必须>=11.3
FROM nvidia/cuda:11.3.1-runtime-ubuntu20.04
ARG useProxyNetwork=''
RUN apt-get update
RUN apt-get install -y curl proxychains curl g++
RUN apt-get install -y git python python3 python-dev python3-dev --fix-missing
# use python3 as the system default python
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.8
# 下载pytorch
RUN python3 -m pip install torch --extra-index-url https://download.pytorch.org/whl/cu113
# 下载分支
WORKDIR /gpt
RUN git clone --depth=1 https://github.com/binary-husky/gpt_academic.git
WORKDIR /gpt/gpt_academic
RUN python3 -m pip install -r requirements.txt
RUN python3 -m pip install -r request_llm/requirements_chatglm.txt
RUN python3 -m pip install -r request_llm/requirements_newbing.txt
RUN python3 -m pip install -r request_llm/requirements_jittorllms.txt -i https://pypi.jittor.org/simple -I
# 下载JittorLLMs
RUN git clone https://github.com/binary-husky/JittorLLMs.git --depth 1 request_llm/jittorllms
# 禁用缓存,确保更新代码
ADD "https://www.random.org/cgi-bin/randbyte?nbytes=10&format=h" skipcache
RUN git pull
# 预热Tiktoken模块
RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'
# 启动
CMD ["python3", "-u", "main.py"]

20
docs/GithubAction+NoLocal 普通文件
查看文件

@@ -0,0 +1,20 @@
# 此Dockerfile适用于“无本地模型”的环境构建,如果需要使用chatglm等本地模型,请参考 docs/Dockerfile+ChatGLM
# 如何构建: 先修改 `config.py`, 然后 docker build -t gpt-academic-nolocal -f docs/Dockerfile+NoLocal .
# 如何运行: docker run --rm -it --net=host gpt-academic-nolocal
FROM python:3.11
# 指定路径
WORKDIR /gpt
# 装载项目文件
COPY . .
# 安装依赖
RUN pip3 install -r requirements.txt
# 可选步骤,用于预热模块
RUN python3 -c 'from check_proxy import warm_up_modules; warm_up_modules()'
# 启动
CMD ["python3", "-u", "main.py"]

某些文件未显示,因为此 diff 中更改的文件太多 显示更多