Master 4.0 (#2210)

* stage academic conversation

* stage document conversation

* fix buggy gradio version

* file dynamic load

* merge more academic plugins

* accelerate nltk

* feat: 为predict函数添加文件和URL读取功能
- 添加URL检测和网页内容提取功能,支持自动提取网页文本
- 添加文件路径识别和文件内容读取功能,支持private_upload路径格式
- 集成WebTextExtractor处理网页内容提取
- 集成TextContentLoader处理本地文件读取
- 支持文件路径与问题组合的智能处理

* back

* block unstable

---------

Co-authored-by: XiaoBoAI <liuboyin2019@ia.ac.cn>
这个提交包含在:
binary-husky
2025-08-23 15:59:22 +08:00
提交者 GitHub
父节点 65a4cf59c2
当前提交 8042750d41
共有 79 个文件被更改,包括 20850 次插入57 次删除

查看文件

@@ -0,0 +1,55 @@
# Crossref query optimization prompt
CROSSREF_QUERY_PROMPT = """Analyze and optimize the query for Crossref search.
Query: {query}
Task: Transform the natural language query into an optimized Crossref search query.
Always generate English search terms regardless of the input language.
IMPORTANT: Ignore any requirements about journal ranking (CAS, JCR, IF index),
or output format requirements. Focus only on the core research topic for the search query.
Available search fields and filters:
1. Basic fields:
- title: Search in title
- abstract: Search in abstract
- author: Search for author names
- container-title: Search in journal/conference name
- publisher: Search by publisher name
- type: Filter by work type (journal-article, book-chapter, etc.)
- year: Filter by publication year
2. Boolean operators:
- AND: Both terms must appear
- OR: Either term can appear
- NOT: Exclude terms
- "": Exact phrase match
3. Special filters:
- is-referenced-by-count: Filter by citation count
- from-pub-date: Filter by publication date
- has-abstract: Filter papers with abstracts
Examples:
1. Query: "Machine learning in healthcare after 2020"
<query>title:"machine learning" AND title:healthcare AND from-pub-date:2020</query>
2. Query: "Papers by Geoffrey Hinton about deep learning"
<query>author:"Hinton, Geoffrey" AND (title:"deep learning" OR abstract:"deep learning")</query>
3. Query: "Most cited papers about transformers in Nature"
<query>title:transformer AND container-title:Nature AND is-referenced-by-count:[100 TO *]</query>
4. Query: "Recent BERT applications in medical domain"
<query>title:BERT AND abstract:medical AND from-pub-date:2020 AND type:journal-article</query>
Please analyze the query and respond ONLY with XML tags:
<query>Provide the optimized Crossref search query using appropriate fields and operators</query>"""
# System prompt
CROSSREF_QUERY_SYSTEM_PROMPT = """You are an expert at crafting Crossref search queries.
Your task is to optimize natural language queries for Crossref's API.
Focus on creating precise queries that will return relevant results.
Always generate English search terms regardless of the input language.
Consider using field-specific search terms and appropriate filters to improve search accuracy."""