Web Search Feature
nGPT includes a powerful web search capability that enhances your prompts with real-time information from the web. This feature is particularly useful for:
- Getting up-to-date information that may not be in the model’s training data
- Researching specific topics with authoritative sources
- Providing context for questions about current events
- Fact-checking and verification
Using Web Search
To enable web search in nGPT, use the --web-search
or --web
flag:
# Basic web search query
ngpt --web-search "What are the latest developments in quantum computing?"
# Interactive mode with web search enabled
ngpt -i --web-search
# With code generation
ngpt --code --web-search "Create a function to calculate Bitcoin's current price"
How It Works
When you enable web search:
- nGPT uses DuckDuckGo to search for relevant information
- The search results (typically 5 sources) are processed to extract the most relevant content
- This information is added to your prompt before sending it to the AI model
- The model can then reference this information in its response
By default, the model will include numbered citations to reference the sources it used.
Advanced Content Extraction
nGPT uses a sophisticated content extraction algorithm that:
- Analyzes content density to identify the main content of web pages
- Filters out boilerplate content like navigation menus, ads, and sidebars
- Prioritizes semantic content blocks based on:
- Text-to-HTML ratio
- Link density (lower is better)
- Paragraph density and structure
- Content indicators in HTML attributes
Our extraction technology uses Python’s standard library and BeautifulSoup with the built-in html.parser, ensuring:
- High-quality content extraction
- Fast performance
- Minimal dependencies
- Accurate identification of main content versus navigation/boilerplate
- Special handling for popular sites like Wikipedia and major news outlets
Configuration Options
You can set web search as your default:
# Enable web search by default
ngpt --cli-config set web-search true
Example Output
When using web search, the model will typically include citations:
According to recent studies, quantum computing has seen several breakthroughs in 2024 [1].
IBM announced a new 1,000-qubit processor in March [2], while Google has demonstrated
quantum advantage in a practical application for the first time [3].
References:
> [1] https://example.com/quantum-computing-advances-2024
>
> [2] https://research.ibm.com/blog/1000-qubit-processor
>
> [3] https://ai.googleblog.com/2024/quantum-advantage-practical
Code Generation with Web Search
When combined with code generation, web search can be especially powerful:
# Generate code for a current API with web search
ngpt --code --web-search "Create a function to query the latest Hacker News API"
The model will be able to reference up-to-date API documentation and provide more accurate code samples.
Limitations
- Web search results depend on the quality of the search engine results
- The extraction process may occasionally miss content from websites with unusual structures
- The feature requires internet connectivity
- Results may vary based on region and search availability
Performance Considerations
The web search feature adds a small amount of latency to requests as it needs to:
- Perform the search query
- Download and process the top results
- Extract relevant content
- Format the information for the model
However, the benefits of having up-to-date information typically outweigh this slight increase in response time.