13 KiB

Raw Blame History Unescape Escape

🔍 arXiv Paper Search & Download Tool

A comprehensive Python tool for searching, analyzing, and downloading research papers from arXiv using their public API. Perfect for researchers, students, and anyone interested in academic papers.

📋 Table of Contents

Features
Installation
Quick Start
Usage Modes
API Functions
Examples
Advanced Usage
Troubleshooting

✨ Features

🔍 Smart Search: Search arXiv papers by title, author, abstract, or any keyword
📥 Smart Download: Download PDFs with automatic filename renaming to paper titles
📊 Result Parsing: Automatically extract structured information (title, authors, abstract, ID)
🖥️ Interactive Mode: Command-line interface for easy searching and downloading
⚡ Batch Operations: Search multiple papers and download in sequence
📈 Academic Research: Perfect for literature reviews and research discovery
🔄 Auto-Rename: Downloaded files are automatically named using paper titles instead of cryptic IDs

🚀 Installation

Prerequisites

Python 3.6 or higher
Internet connection for API access

Install Dependencies

pip install requests

Download the Script

# Clone or download arxiv_api.py to your working directory

🎯 Quick Start

Basic Search

python arxiv_api.py "machine learning"

Search with Custom Results

python arxiv_api.py "quantum computing" -n 10

Search and Download First Result

python arxiv_api.py "deep learning" -d

Interactive Mode

python arxiv_api.py -i

Download Paper by ID (with auto-rename)

# In interactive mode:
# 📚 arxiv> download 2502.05218v1
# This will automatically rename the file to the paper's title

🎮 Usage Modes

1. Command Line Mode

Direct search queries from the command line.

Syntax:

python arxiv_api.py [query] [options]

Options:

-n, --max_results: Maximum number of results (default: 5)
-d, --download: Download the first result automatically
-i, --interactive: Start interactive mode
-h, --help: Show help message

2. Interactive Mode

Interactive command-line interface for multiple operations.

Commands:

search <query> [max_results]: Search for papers
download <paper_id>: Download a specific paper (with auto-rename)
help: Show available commands
quit/exit: Exit the program

🔧 API Functions

Core Functions

`search_arxiv(query, max_results=10)`

Searches arXiv for papers using the public API.

Parameters:

query (str): Search query string
max_results (int): Maximum number of results (default: 10)

Returns:

str: XML response from arXiv API

Example:

from arxiv_api import search_arxiv

results = search_arxiv("artificial intelligence", max_results=5)

`get_paper_metadata(paper_id)`

Fetches paper metadata directly from arXiv API using paper ID.

Parameters:

paper_id (str): arXiv paper ID (e.g., "2502.05218v1")

Returns:

dict: Paper information dictionary, or None if not found

Example:

from arxiv_api import get_paper_metadata

paper_info = get_paper_metadata("2502.05218v1")
if paper_info:
    print(f"Title: {paper_info['title']}")
    print(f"Authors: {', '.join(paper_info['authors'])}")

`download_paper(paper_id, output_dir=".", paper_title=None)`

Downloads a specific paper by its arXiv ID and automatically renames it to the paper title.

Parameters:

paper_id (str): arXiv paper ID (e.g., "2502.05218v1")
output_dir (str): Output directory (default: current directory)
paper_title (str): Paper title for filename (optional, will be fetched automatically if not provided)

Returns:

str: File path of downloaded PDF, or None if failed

Features:

Auto-rename: Automatically renames downloaded files to paper titles
Smart cleaning: Removes special characters and limits filename length
Fallback: Uses paper ID if title is unavailable

Example:

from arxiv_api import download_paper

# Download with automatic title fetching and renaming
filepath = download_paper("2502.05218v1")

# Download with custom title
filepath = download_paper("2502.05218v1", paper_title="My Custom Title")

`parse_search_results(xml_content)`

Parses XML search results and extracts structured paper information.

Parameters:

xml_content (str): XML response from arXiv API

Returns:

list: List of dictionaries containing paper information

Paper Information Structure:

{
    'title': 'Paper Title',
    'authors': ['Author 1', 'Author 2'],
    'abstract': 'Paper abstract...',
    'paper_id': '2502.05218v1',
    'published': '2025-02-05T12:37:15Z'
}

`search_and_download(query, max_results=5, download_first=False)`

Combined function that searches for papers and optionally downloads the first result.

Parameters:

query (str): Search query string
max_results (int): Maximum number of results (default: 5)
download_first (bool): Whether to download first result (default: False)

Example:

from arxiv_api import search_and_download

# Search and display results only
search_and_download("machine learning", max_results=3)

# Search and download first result (with auto-rename)
search_and_download("deep learning", max_results=5, download_first=True)

Interactive Mode Functions

`interactive_mode()`

Starts the interactive command-line interface.

Features:

Command history
Error handling
User-friendly prompts
Multiple search sessions
Smart download with auto-rename

📚 Examples

Example 1: Basic Paper Search

# Search for machine learning papers
python arxiv_api.py "machine learning"

# Output:
# Searching arXiv for: 'machine learning'
# --------------------------------------------------
# Found 5 papers:
# 
# 1. Title: Introduction to Machine Learning
#    Authors: John Doe, Jane Smith
#    Paper ID: 2103.12345
#    Published: 2021-03-15T10:30:00Z
#    Abstract: This paper introduces...

Example 2: Search with Custom Results

# Get 10 results for quantum computing
python arxiv_api.py "quantum computing" -n 10

Example 3: Search and Download (with auto-rename)

# Search for papers and download the first one
python arxiv_api.py "artificial intelligence" -d
# Downloaded file will be automatically renamed to the paper title

Example 4: Interactive Mode with Smart Download

python arxiv_api.py -i

# 📚 arxiv> search blockchain finance 5
# 📚 arxiv> download 2502.05218v1
# Fetching paper information for 2502.05218v1...
# Found paper: FactorGCL: A Hypergraph-Based Factor Model...
# Downloaded: .\FactorGCL_A_Hypergraph-Based_Factor_Model...pdf
# 📚 arxiv> help
# 📚 arxiv> quit

Example 5: Python Script Integration

from arxiv_api import search_and_download, download_paper, get_paper_metadata

# Search for papers on a specific topic
search_and_download("quantitative finance China", max_results=3)

# Download a specific paper with auto-rename
download_paper("2502.05218v1")

# Get paper metadata
paper_info = get_paper_metadata("2502.05218v1")
if paper_info:
    print(f"Title: {paper_info['title']}")

🔍 Advanced Usage

Smart Download Features

Automatic Filename Generation

from arxiv_api import download_paper

# The tool automatically:
# 1. Fetches paper metadata
# 2. Extracts the title
# 3. Cleans the title for filename use
# 4. Downloads and renames the file

# Example output filename:
# "FactorGCL_A_Hypergraph-Based_Factor_Model_with_Temporal_Residual_Contrastive_Learning_for_Stock_Returns_Prediction.pdf"

Custom Search Queries

Field-Specific Searches

# Search by author
python arxiv_api.py "au:Yann LeCun"

# Search by title
python arxiv_api.py "ti:deep learning"

# Search by abstract
python arxiv_api.py "abs:neural networks"

# Search by category
python arxiv_api.py "cat:cs.AI"

Complex Queries

# Multiple terms
python arxiv_api.py "machine learning AND neural networks"

# Exclude terms
python arxiv_api.py "deep learning NOT reinforcement"

# Date range
python arxiv_api.py "machine learning AND submittedDate:[20230101 TO 20231231]"

Batch Operations

Download Multiple Papers with Auto-Rename

from arxiv_api import search_arxiv, parse_search_results, download_paper

# Search for papers
query = "quantum computing"
results = search_arxiv(query, max_results=10)
papers = parse_search_results(results)

# Download all papers (each will be automatically renamed)
for paper in papers:
    paper_id = paper.get('paper_id')
    if paper_id:
        download_paper(paper_id, output_dir="./quantum_papers")

Custom Output Formatting

from arxiv_api import search_and_download

# Custom display function
def custom_display(papers):
    for i, paper in enumerate(papers, 1):
        print(f"📄 Paper {i}: {paper['title']}")
        print(f"👥 Authors: {', '.join(paper['authors'])}")
        print(f"🆔 ID: {paper['paper_id']}")
        print(f"📅 Date: {paper['published']}")
        print(f"📝 Abstract: {paper['abstract'][:150]}...")
        print("-" * 80)

# Use custom display
search_and_download("blockchain", max_results=3)

🛠️ Troubleshooting

Common Issues

1. No Results Found

Problem: Search returns no papers Solution:

Check spelling and use broader terms
Try different keyword combinations
Verify internet connection

2. Download Failed

Problem: Paper download fails Solution:

Verify paper ID is correct
Check if paper exists on arXiv
Ensure write permissions in output directory

3. API Rate Limiting

Problem: Too many requests Solution:

Wait between requests
Reduce batch size
Use interactive mode for multiple searches

4. XML Parsing Errors

Problem: Error parsing search results Solution:

Check internet connection
Verify API response format
Update the script if needed

5. Filename Too Long

Problem: Generated filename exceeds system limits Solution:

The tool automatically limits filenames to 100 characters
Special characters are automatically cleaned
Fallback to paper ID if title is unavailable

Error Messages

Error: Failed to download paper 2502.05218v1

Paper ID may not exist
Network connection issue
arXiv server problem

Error parsing XML: ...

Malformed API response
Network interruption
API format change

Could not find paper information for 2502.05218v1

Paper ID may be invalid
arXiv API issue
Network connectivity problem

📖 API Reference

arXiv API Endpoints

Search API: http://export.arxiv.org/api/query
Metadata API: http://export.arxiv.org/api/query?id_list={paper_id}
Documentation: https://arxiv.org/help/api
Rate Limits: Be respectful, avoid excessive requests

Data Fields Available

Title: Paper title
Authors: List of author names
Abstract: Paper abstract
Paper ID: Unique arXiv identifier
Published Date: Publication timestamp
Categories: arXiv subject categories

Paper ID Format

Format: YYMM.NNNNNvN
Example: 2502.05218v1
Download URL: https://arxiv.org/pdf/{paper_id}.pdf

Smart Download Features

Automatic Metadata Fetching: Gets paper information before download
Intelligent Filename Generation: Converts paper titles to valid filenames
Character Cleaning: Removes special characters and spaces
Length Limiting: Ensures filenames don't exceed system limits
Fallback Naming: Uses paper ID if title is unavailable

🤝 Contributing

Adding New Features

Fork the repository
Create a feature branch
Implement your changes
Add tests and documentation
Submit a pull request

Reporting Issues

Check existing issues first
Provide detailed error messages
Include system information
Describe steps to reproduce

📄 License

This project is open source and available under the MIT License.

🙏 Acknowledgments

arXiv: For providing the public API
Python Community: For excellent libraries and tools
Researchers: For contributing to open science

📞 Support

Getting Help

Check this documentation first
Review the examples section
Search existing issues
Create a new issue for bugs

Useful Links

Happy Researching! 🎓📚

This tool makes academic research more accessible and efficient. Use it responsibly and respect arXiv's terms of service.

13 KiB Raw Blame History Unescape Escape

🔍 arXiv Paper Search & Download Tool

📋 Table of Contents

✨ Features

🚀 Installation

Prerequisites

Install Dependencies

Download the Script

🎯 Quick Start

Basic Search

Search with Custom Results

Search and Download First Result

Interactive Mode

Download Paper by ID (with auto-rename)

🎮 Usage Modes

1. Command Line Mode

2. Interactive Mode

🔧 API Functions

Core Functions

search_arxiv(query, max_results=10)

get_paper_metadata(paper_id)

download_paper(paper_id, output_dir=".", paper_title=None)

parse_search_results(xml_content)

search_and_download(query, max_results=5, download_first=False)

Interactive Mode Functions

interactive_mode()

📚 Examples

Example 1: Basic Paper Search

Example 2: Search with Custom Results

Example 3: Search and Download (with auto-rename)

Example 4: Interactive Mode with Smart Download

Example 5: Python Script Integration

🔍 Advanced Usage

Smart Download Features

Automatic Filename Generation

Custom Search Queries

Field-Specific Searches

Complex Queries

Batch Operations

Download Multiple Papers with Auto-Rename

Custom Output Formatting

🛠️ Troubleshooting

Common Issues

1. No Results Found

2. Download Failed

3. API Rate Limiting

4. XML Parsing Errors

5. Filename Too Long

Error Messages

📖 API Reference

arXiv API Endpoints

Data Fields Available

Paper ID Format

Smart Download Features

🤝 Contributing

Adding New Features

Reporting Issues

📄 License

🙏 Acknowledgments

📞 Support

Getting Help

Useful Links

13 KiB

Raw Blame History Unescape Escape

`search_arxiv(query, max_results=10)`

`get_paper_metadata(paper_id)`

`download_paper(paper_id, output_dir=".", paper_title=None)`

`parse_search_results(xml_content)`

`search_and_download(query, max_results=5, download_first=False)`

`interactive_mode()`