- Home ›
- Free Tools ›
- Free Internal Links Checker SEO Tool - Create Your Own
Free Internal Links Checker SEO Tool - Create Your Own
Want to check how many internal links are present for a
specific page? Online seo tools are not giving the right
results? Now it is time to create your own free seo tool
using Python which will check the internal links for any
page you provide.

Purpose of the Code:
The purpose of this free internal links checker seo tool is to search through a website, recursively crawling its internal links, to find which pages link to a specific target URL. This can be useful for:
This free seo tool using Python code performs web scraping and crawling on a specific website to find and list pages that contain a given target URL. Here's a detailed explanation of what the code does:
- SEO purposes (finding where a particular page is linked from).
- Web analysis (mapping internal link structure).
1. Imports and Setup:
2. Global Variables:
- requests: Used to send HTTP requests and fetch the content of web pages.
- BeautifulSoup: From the bs4 library, it's used for parsing HTML and navigating the structure of the page to extract data.
- urlparse and urljoin: From the urlparse module (which was moved to urllib.parse in Python 3), these functions are used to handle and resolve URLs, particularly relative URLs.
3. crawl(url) Function:
- base_url: The root URL of the website (https://newscurrentaffairs.info) that the crawler will start from.
- target_url: A specific URL (https://newscurrentaffairs.info/free-tools/free-google-index-checker-seo-tool-create-your-own.html) that the script looks for in the pages it crawls.
- visited: A set that stores URLs that have already been crawled to avoid revisiting them.
- pages_with_target: A list that stores URLs of pages that contain a link to the target URL.
This is a recursive function that crawls the website starting from the given url.
Base condition: If the URL has already been visited or is not from the base domain (base_url), the function returns without doing anything.
Sending HTTP Request:
Parsing the HTML:
- A GET request is sent to the URL using requests.get.
- If the request is successful (response.raise_for_status() ensures that an error is raised for invalid responses), the URL is added to the visited set.
The HTML content of the page is parsed using BeautifulSoup.
Checking for the Target URL:
Crawling Internal Links:
- The code looks for an anchor (<a>) tag with an href attribute that matches the target_url.
- If such a link is found, the current page URL is added to the pages_with_target list.
- The function then loops over all anchor (<a>) tags on the page, extracts their href attributes (links), and resolves any relative URLs using urljoin.
- It checks if the full URL is internal by comparing the network location (domain) part of the URL (urlparse(full_url).netloc) with the base_url.
- If the URL is internal, it recursively calls crawl to visit that page.
Error Handling:
The function handles exceptions like network errors using a try-except block. If an error occurs during crawling a URL (e.g., the page is unreachable), it prints an error message and continues.
4. Main Execution:
Example Scenario:
- The script starts by calling the crawl function on the base_url, which begins the crawling process.
- After the crawl finishes, the script prints the list of URLs (pages_with_target) that contain the target_url.
If the target URL is https://newscurrentaffairs.info/free-tools/free-google-index-checker-seo-tool-create-your-own.html, the code will:
Here is the Python code for free internal links checker seo tool:
- Start crawling from the homepage (https://newscurrentaffairs.info).
- Visit each page on the site, checking if any of those pages contain a link to the target URL.
- Print a list of pages that contain the target URL.
import requests
from bs4 import BeautifulSoup
from urlparse import urljoin, urlparse
# Base URL of the website
base_url = "https://newscurrentaffairs.info"
# Target URL to check
target_url = "https://newscurrentaffairs.info/free-tools/free-google-index-checker-seo-tool-create-your-own.html"
# A set to track visited URLs
visited = set()
# A list to store pages containing the target URL
pages_with_target = []
def crawl(url):
"""Recursively crawl pages on the website."""
if url in visited or not url.startswith(base_url):
return
try:
response = requests.get(url, timeout=10)
response.raise_for_status()
visited.add(url)
soup = BeautifulSoup(response.text, 'html.parser')
# Check if the target URL is in the current page
if soup.find("a", href=target_url):
pages_with_target.append(url)
# Find all internal links on the page
for link in soup.find_all("a", href=True):
href = link["href"]
# Resolve relative URLs
full_url = urljoin(base_url, href)
# Ensure it's an internal link
if urlparse(full_url).netloc == urlparse(base_url).netloc:
crawl(full_url)
except (requests.RequestException, Exception) as e:
print("Error crawling {}: {}".format(url, e))
if __name__ == "__main__":
# Start crawling from the homepage
crawl(base_url)
# Print results
print("Pages containing the target URL:")
for page in pages_with_target:
print(page)
Summary
- Install Python and required libraries.
- Copy the code in a file and save the file with internallinkschecker.py extension.
- In command prompt, run the command - python internallinkschecker.py