cd ../blog

Build Your Own Search Result Scraper with Markdown Output Using FastAPI, SearXNG, and Browserless

> Learn how to build your own search result scraper with FastAPI, SearXNG, and Browserless, and return results in Markdown format using a proxy.

Build Your Own Search Result Scraper with Markdown Output Using FastAPI, SearXNG, and Browserless

Build Your Own Search Result Scraper with Markdown Output Using FastAPI, SearXNG, and Browserless

Today, I'm excited to share with you a detailed guide on how to build your own search result scraper that returns results in Markdown format. We'll be using FastAPI, SearXNG, and Browserless, and we'll run everything in Docker containers.

This tutorial is perfect for early-stage students or anyone interested in web scraping and data extraction. By the end of this guide, you'll have a working application that can fetch search results, scrape content, and convert it into Markdown format, all while using a proxy.

Table of Contents

Services We'll Use

  • FastAPI: A modern, fast (high-performance), web framework for building APIs with Python 3.6+.
  • SearXNG: A free internet metasearch engine which aggregates results from various search services and databases.
  • Browserless: A web browser automation service that allows you to scrape web pages without dealing with the browser directly.

Purpose of Scraping

Web scraping allows you to extract useful information from websites and use it for various purposes like data analysis, content aggregation, and more. In this tutorial, we'll focus on scraping search results and converting them into Markdown format for easy readability and integration with other tools.

Prerequisites

Before we begin, make sure you have the following installed:

  • Python 3.11
  • Virtualenv

You can install the prerequisites using the following commands:

Docker Setup

You can use Docker to simplify the setup process. Follow these steps:

  1. Clone the repository:

  2. Run Docker Compose:

With this setup, if you change the .env or main.py file, you no longer need to restart Docker. Changes will be reloaded automatically.

Manual Setup

Follow these steps for manual setup:

  1. Clone the repository:

  2. Create and activate virtual environment:

  3. Install dependencies:

  4. Create a .env file in the root directory with the following content:

Writing the Code

Here's the complete code for our FastAPI application:

.env File

Explanation of Each Variable

  • SEARXNG_URL: This is the URL where your SearXNG service is running. In this setup, it's running locally on port 8888.
  • BROWSERLESS_URL: This is the URL where your Browserless service is running. In this setup, it's running locally on port 3000.
  • TOKEN: This is a placeholder for any API token that might be required by your services. In this specific example, it's not actively used but can be kept for future use or services that require authentication.
  • PROXY_PROTOCOL: The protocol used by your proxy service. Typically, this will be either http or https.
  • PROXY_URL: The URL or IP address of your proxy service provider. Here, we're using a Geonode proxy.
  • PROXY_USERNAME: The username for authenticating with your proxy service. This is specific to your Geonode account.
  • PROXY_PASSWORD: The password for authenticating with your proxy service. This is specific to your Geonode account.
  • PROXY_PORT: The port number on which your proxy service is running. Common ports include 8080 and 9000.
  • REQUEST_TIMEOUT: The timeout duration for HTTP requests, specified in seconds. This helps ensure your application doesn't hang indefinitely while waiting for a response.

Why Use a .env File?

  1. Security: Keeps sensitive information like API keys, tokens, and passwords out of your codebase.
  2. Configuration: Allows easy configuration changes without modifying the code.
  3. Environment-Specific Settings: Easily switch configurations between different environments (development, testing, production) by changing the .env file.

How to Use the .env File

  1. Create the .env file: In the root of your project directory, create a file named .env.
  2. Add your variables: Copy the variables listed above into your .env file, replacing the example values with your actual values.
  3. Load the .env file in your code: Use python-dotenv to load these variables into your application.

main.py

Running SearXNG and Browserless in Docker

Next, let's set up and run the SearXNG and Browserless services using Docker. Create a shell script run-services.sh with the following content:

Make the script executable and run it:

This script will pull and run the SearXNG and Browserless Docker images, making them accessible on your local machine.

Using Proxies

In this tutorial, I'm using Geonode proxies to scrape content. You can use my to get started with their proxy services.

Full Source Code

You can find the full source code for this project on my .

Enjoyed the Post?

I hope you found this tutorial helpful! Feel free to reach out if you have any questions or need further assistance. Happy scraping!


Enjoyed the post? Follow my blog at for more tutorials and insights.