文件
gpt_academic/crazy_functions/review_fns/prompts/arxiv_prompts.py
binary-husky 8042750d41 Master 4.0 (#2210)
* stage academic conversation

* stage document conversation

* fix buggy gradio version

* file dynamic load

* merge more academic plugins

* accelerate nltk

* feat: 为predict函数添加文件和URL读取功能
- 添加URL检测和网页内容提取功能,支持自动提取网页文本
- 添加文件路径识别和文件内容读取功能,支持private_upload路径格式
- 集成WebTextExtractor处理网页内容提取
- 集成TextContentLoader处理本地文件读取
- 支持文件路径与问题组合的智能处理

* back

* block unstable

---------

Co-authored-by: XiaoBoAI <liuboyin2019@ia.ac.cn>
2025-08-23 15:59:22 +08:00

342 行
12 KiB
Python

# Basic type analysis prompt
ARXIV_TYPE_PROMPT = """Analyze the research query and determine if arXiv search is needed and its type.
Query: {query}
Task 1: Determine if this query requires arXiv search
- arXiv is suitable for:
* Computer science and AI/ML
* Physics and mathematics
* Quantitative biology and finance
* Electrical engineering
* Recent preprints in these fields
- arXiv is NOT needed for:
* Medical research (unless ML/AI applications)
* Social sciences
* Business studies
* Humanities
* Industry reports
Task 2: If arXiv search is needed, determine the most appropriate search type
Available types:
1. basic: Keyword-based search across all fields
- For specific technical queries
- When looking for particular methods or applications
2. category: Category-based search within specific fields
- For broad topic exploration
- When surveying a research area
3. none: arXiv search not needed for this query
- When topic is outside arXiv's scope
- For non-technical or clinical research
Examples:
1. Query: "BERT transformer architecture"
<search_type>basic</search_type>
2. Query: "latest developments in machine learning"
<search_type>category</search_type>
3. Query: "COVID-19 clinical trials"
<search_type>none</search_type>
4. Query: "psychological effects of social media"
<search_type>none</search_type>
Please analyze the query and respond ONLY with XML tags:
<search_type>Choose either 'basic', 'category', or 'none'</search_type>"""
# Query optimization prompt
ARXIV_QUERY_PROMPT = """Optimize the following query for arXiv search.
Query: {query}
Task: Transform the natural language query into an optimized arXiv search query using boolean operators and field tags.
Always generate English search terms regardless of the input language.
IMPORTANT: Ignore any requirements about journal ranking (CAS, JCR, IF index),
or output format requirements. Focus only on the core research topic for the search query.
Available field tags:
- ti: Search in title
- abs: Search in abstract
- au: Search for author
- all: Search in all fields (default)
Boolean operators:
- AND: Both terms must appear
- OR: Either term can appear
- NOT: Exclude terms
- (): Group terms
- "": Exact phrase match
Examples:
1. Natural query: "Recent papers about transformer models by Vaswani"
<query>ti:"transformer model" AND au:Vaswani AND year:[2017 TO 2024]</query>
2. Natural query: "Deep learning for computer vision, excluding surveys"
<query>ti:(deep learning AND "computer vision") NOT (ti:survey OR ti:review)</query>
3. Natural query: "Attention mechanism in language models"
<query>ti:(attention OR "attention mechanism") AND abs:"language model"</query>
4. Natural query: "GANs or generative adversarial networks for image generation"
<query>(ti:GAN OR ti:"generative adversarial network") AND abs:"image generation"</query>
Please analyze the query and respond ONLY with XML tags:
<query>Provide the optimized search query using appropriate operators and tags</query>
Note:
- Use quotes for exact phrases
- Combine multiple conditions with boolean operators
- Consider both title and abstract for important concepts
- Include author names when relevant
- Use parentheses for complex logical groupings"""
# Sort parameters prompt
ARXIV_SORT_PROMPT = """Determine optimal sorting parameters for the research query.
Query: {query}
Task: Select the most appropriate sorting parameters to help users find the most relevant papers.
Available sorting options:
1. Sort by:
- relevance: Best match to query terms (default)
- lastUpdatedDate: Most recently updated papers
- submittedDate: Most recently submitted papers
2. Sort order:
- descending: Newest/Most relevant first (default)
- ascending: Oldest/Least relevant first
3. Result limit:
- Minimum: 10 papers
- Maximum: 50 papers
- Recommended: 20-30 papers for most queries
Examples:
1. Query: "Latest developments in transformer models"
<sort_by>submittedDate</sort_by>
<sort_order>descending</sort_order>
<limit>30</limit>
2. Query: "Foundational papers about neural networks"
<sort_by>relevance</sort_by>
<sort_order>descending</sort_order>
<limit>20</limit>
3. Query: "Evolution of deep learning since 2012"
<sort_by>submittedDate</sort_by>
<sort_order>ascending</sort_order>
<limit>50</limit>
Please analyze the query and respond ONLY with XML tags:
<sort_by>Choose: relevance, lastUpdatedDate, or submittedDate</sort_by>
<sort_order>Choose: ascending or descending</sort_order>
<limit>Suggest number between 10-50</limit>
Note:
- Choose relevance for specific technical queries
- Use lastUpdatedDate for tracking paper revisions
- Use submittedDate for following recent developments
- Consider query context when setting the limit"""
# System prompts for each task
ARXIV_TYPE_SYSTEM_PROMPT = """You are an expert at analyzing academic queries.
Your task is to determine whether the query is better suited for keyword search or category-based search.
Consider the query's specificity, scope, and intended search area when making your decision.
Always respond in English regardless of the input language."""
ARXIV_QUERY_SYSTEM_PROMPT = """You are an expert at crafting arXiv search queries.
Your task is to optimize natural language queries using boolean operators and field tags.
Focus on creating precise, targeted queries that will return the most relevant results.
Always generate English search terms regardless of the input language."""
ARXIV_CATEGORIES_SYSTEM_PROMPT = """You are an expert at arXiv category classification.
Your task is to select the most relevant categories for the given research query.
Consider both primary and related interdisciplinary categories, while maintaining focus on the main research area.
Always respond in English regardless of the input language."""
ARXIV_SORT_SYSTEM_PROMPT = """You are an expert at optimizing search results.
Your task is to determine the best sorting parameters based on the query context.
Consider the user's likely intent and temporal aspects of the research topic.
Always respond in English regardless of the input language."""
# 添加新的搜索提示词
ARXIV_SEARCH_PROMPT = """Analyze and optimize the research query for arXiv search.
Query: {query}
Task: Transform the natural language query into an optimized arXiv search query.
Available search options:
1. Basic search with field tags:
- ti: Search in title
- abs: Search in abstract
- au: Search for author
Example: "ti:transformer AND abs:attention"
2. Category-based search:
- Use specific arXiv categories
Example: "cat:cs.AI AND neural networks"
3. Date range:
- Specify date range using submittedDate
Example: "deep learning AND submittedDate:[20200101 TO 20231231]"
Examples:
1. Query: "Recent papers about transformer models by Vaswani"
<search_criteria>
<query>ti:"transformer model" AND au:Vaswani AND submittedDate:[20170101 TO 99991231]</query>
<categories>cs.CL, cs.AI, cs.LG</categories>
<sort_by>submittedDate</sort_by>
<sort_order>descending</sort_order>
<limit>30</limit>
</search_criteria>
2. Query: "Latest developments in computer vision"
<search_criteria>
<query>cat:cs.CV AND submittedDate:[20220101 TO 99991231]</query>
<categories>cs.CV, cs.AI, cs.LG</categories>
<sort_by>submittedDate</sort_by>
<sort_order>descending</sort_order>
<limit>25</limit>
</search_criteria>
Please analyze the query and respond with XML tags containing search criteria."""
ARXIV_SEARCH_SYSTEM_PROMPT = """You are an expert at crafting arXiv search queries.
Your task is to analyze research queries and transform them into optimized arXiv search criteria.
Consider query intent, relevant categories, and temporal aspects when creating the search parameters.
Always generate English search terms and respond in English regardless of the input language."""
# Categories selection prompt
ARXIV_CATEGORIES_PROMPT = """Select the most relevant arXiv categories for the research query.
Query: {query}
Task: Choose 2-4 most relevant categories that best match the research topic.
Available Categories:
Computer Science (cs):
- cs.AI: Artificial Intelligence (neural networks, machine learning, NLP)
- cs.CL: Computation and Language (NLP, machine translation)
- cs.CV: Computer Vision and Pattern Recognition
- cs.LG: Machine Learning (deep learning, reinforcement learning)
- cs.NE: Neural and Evolutionary Computing
- cs.RO: Robotics
- cs.IR: Information Retrieval
- cs.SE: Software Engineering
- cs.DB: Databases
- cs.DC: Distributed Computing
- cs.CY: Computers and Society
- cs.HC: Human-Computer Interaction
Mathematics (math):
- math.OC: Optimization and Control
- math.PR: Probability
- math.ST: Statistics
- math.NA: Numerical Analysis
- math.DS: Dynamical Systems
Statistics (stat):
- stat.ML: Machine Learning
- stat.ME: Methodology
- stat.TH: Theory
- stat.AP: Applications
Physics (physics):
- physics.comp-ph: Computational Physics
- physics.data-an: Data Analysis
- physics.soc-ph: Physics and Society
Electrical Engineering (eess):
- eess.SP: Signal Processing
- eess.AS: Audio and Speech Processing
- eess.IV: Image and Video Processing
- eess.SY: Systems and Control
Examples:
1. Query: "Deep learning for computer vision"
<categories>cs.CV, cs.LG, stat.ML</categories>
2. Query: "Natural language processing with transformers"
<categories>cs.CL, cs.AI, cs.LG</categories>
3. Query: "Reinforcement learning for robotics"
<categories>cs.RO, cs.AI, cs.LG</categories>
4. Query: "Statistical methods in machine learning"
<categories>stat.ML, cs.LG, math.ST</categories>
Please analyze the query and respond ONLY with XML tags:
<categories>List 2-4 most relevant categories, comma-separated</categories>
Note:
- Choose primary categories first, then add related ones
- Limit to 2-4 most relevant categories
- Order by relevance (most relevant first)
- Use comma and space between categories (e.g., "cs.AI, cs.LG")"""
# 在文件末尾添加新的 prompt
ARXIV_LATEST_PROMPT = """Determine if the query is requesting latest papers from arXiv.
Query: {query}
Task: Analyze if the query is specifically asking for recent/latest papers from arXiv.
IMPORTANT RULE:
- The query MUST explicitly mention "arXiv" or "arxiv" to be considered a latest arXiv papers request
- Queries only asking for recent/latest papers WITHOUT mentioning arXiv should return false
Indicators for latest papers request:
1. MUST HAVE keywords about arXiv:
- "arxiv"
- "arXiv"
AND
2. Keywords about recency:
- "latest"
- "recent"
- "new"
- "newest"
- "just published"
- "this week/month"
Examples:
1. Latest papers request (Valid):
Query: "Show me the latest AI papers on arXiv"
<is_latest_request>true</is_latest_request>
2. Latest papers request (Valid):
Query: "What are the recent papers about transformers on arxiv"
<is_latest_request>true</is_latest_request>
3. Not a latest papers request (Invalid - no mention of arXiv):
Query: "Show me the latest papers about BERT"
<is_latest_request>false</is_latest_request>
4. Not a latest papers request (Invalid - no recency):
Query: "Find papers on arxiv about transformers"
<is_latest_request>false</is_latest_request>
Please analyze the query and respond ONLY with XML tags:
<is_latest_request>true/false</is_latest_request>
Note: The response should be true ONLY if both conditions are met:
1. Query explicitly mentions arXiv/arxiv
2. Query asks for recent/latest papers"""
ARXIV_LATEST_SYSTEM_PROMPT = """You are an expert at analyzing academic queries.
Your task is to determine if the query is specifically requesting latest/recent papers from arXiv.
Remember: The query MUST explicitly mention arXiv to be considered valid, even if it asks for recent papers.
Always respond in English regardless of the input language."""