Find Sitemap & search url's API (for external domains/sitemaps)
- or -
Post a project like this393
€28/hr(approx. $30/hr)
- Posted:
- Proposals: 14
- Remote
- #3873563
- Awarded
Certified Laravel Developer | PHP | VueJS | Wordpress | API | Logo | Responsive Design | Shopify
Karachi
⭐ TOP RATED ⭐ Graphic Designer| WordPress / WIX |2D Animator| Video Editing |Photoshop Expert
Karachi
WordPress Expert✮Shopify Expert✮Graphic Designer✮AutoCAD 2D & 3D✮CV Writer & Designer✮Fullstack developer
Rawalpindi
Designer and Developer|WordPress|Shopify|PSD|PHP HTML, JavaScript|WIX| PowerPoint/Keynote| Logo Designing
Berlin
Web developer|| Mobile App Developer | Wordpress | PHP Laravel | Flutter App Developer | Ecommerce | Android App | IOS APP
Santa Clara
48908862608361649971223636533061743409867363657337738833811478504159155602855957976
Description
Experience Level: Expert
I need an API that I can use to search for an url, or part of an URL within an external site’s sitemap.
Usage: Laravel & MYSQL
A sitemap is not always located in the same location, nor is the location always mentioned in the robots.txt. So we need to save sitemap locations in a mysql table so we can use those locations to try on other domains, and so be able to locate more sitemaps.
Finding a sitemap
Create a mysql table “sitemaps” (example name) that we can use to save sitemap names (e.g. sitemap.xml, sitemap_index.php, etc). The table has a ‘sitemap’ and ‘count’ field, the count field is simply a counter for each time we find a sitemap with the same name.
Check if the given domain has a robots.txt (https://example.com/robots.txt), if there is a robots.txt you look for the sitemap directive.
“Sitemap: https://www.example.com/example.xml” (can be multiple)
You save the sitemap location to the sitemaps table, if it already exists you do a +1 on the count field.
If we don’t find the sitemap location in the robots.txt we try to find it using all the sitemap locations we have in our sitemaps table (the more we get, the higher the chance we find it) you check themaps with the highest counts first.
Finding a Url
Once you find the sitemap(s), you create an index of all urls in the sitemap and its nested sitemaps.
Now you simply try to find the given search term using a mysql query or regex.
Example request
/Sitemap?domain=example.com&search=url
Example API Response:
What i want is the API to return matching url’s in json format,
{
search: 'example'
domain: domain.com
statistics{
sitemaps_found: 3,
sitemaps{
1: 'www.domain.com/sitemap1.xml',
2: 'www.domain.com/sitemap453.xml',
3: 'www.domain.com/sitemap345.xml'
}
urls: 28892,
matches: 25
},
matches{
1:'www.domain.com/example/13324223',
2:'www.domain.com/example/94827497'
}
}
Discussion;
We can save the sitemap files we find to our server, and search within those files. Or we can insert all sitemap urls in a mysql table and search from there. Not sure what’s faster, let’s discuss.
Save all url’s in Mysql
Pro: Fast searching
Pro: Easily create a cron to delete entries older than x hours
Pro : Easy maintenance
Con: Need to extract all urls from the sitemap files (can potentially be hundred of thousands url’s)
Save sitemap as files
Pro: No need to extract urls and put them in mysql
Cons: Downloading files that might contain vulnerabilities
Cons: Saving files costs more space than saving only the urls in mysql
>>>Outside the scope of the initial task, but would be a followup task, do-not price this in <<
We will use the API for our own backend, but I also want to publish the tool online. So when the API is finished I will ask you to create a simple frontend for the sitemap search tool.
Nested sitemap example:
https://www.example.com/sitemap1.xml.gz
https://www.example.com/sitemap2.xml.gz
Usage: Laravel & MYSQL
A sitemap is not always located in the same location, nor is the location always mentioned in the robots.txt. So we need to save sitemap locations in a mysql table so we can use those locations to try on other domains, and so be able to locate more sitemaps.
Finding a sitemap
Create a mysql table “sitemaps” (example name) that we can use to save sitemap names (e.g. sitemap.xml, sitemap_index.php, etc). The table has a ‘sitemap’ and ‘count’ field, the count field is simply a counter for each time we find a sitemap with the same name.
Check if the given domain has a robots.txt (https://example.com/robots.txt), if there is a robots.txt you look for the sitemap directive.
“Sitemap: https://www.example.com/example.xml” (can be multiple)
You save the sitemap location to the sitemaps table, if it already exists you do a +1 on the count field.
If we don’t find the sitemap location in the robots.txt we try to find it using all the sitemap locations we have in our sitemaps table (the more we get, the higher the chance we find it) you check themaps with the highest counts first.
Finding a Url
Once you find the sitemap(s), you create an index of all urls in the sitemap and its nested sitemaps.
Now you simply try to find the given search term using a mysql query or regex.
Example request
/Sitemap?domain=example.com&search=url
Example API Response:
What i want is the API to return matching url’s in json format,
{
search: 'example'
domain: domain.com
statistics{
sitemaps_found: 3,
sitemaps{
1: 'www.domain.com/sitemap1.xml',
2: 'www.domain.com/sitemap453.xml',
3: 'www.domain.com/sitemap345.xml'
}
urls: 28892,
matches: 25
},
matches{
1:'www.domain.com/example/13324223',
2:'www.domain.com/example/94827497'
}
}
Discussion;
We can save the sitemap files we find to our server, and search within those files. Or we can insert all sitemap urls in a mysql table and search from there. Not sure what’s faster, let’s discuss.
Save all url’s in Mysql
Pro: Fast searching
Pro: Easily create a cron to delete entries older than x hours
Pro : Easy maintenance
Con: Need to extract all urls from the sitemap files (can potentially be hundred of thousands url’s)
Save sitemap as files
Pro: No need to extract urls and put them in mysql
Cons: Downloading files that might contain vulnerabilities
Cons: Saving files costs more space than saving only the urls in mysql
>>>Outside the scope of the initial task, but would be a followup task, do-not price this in <<
We will use the API for our own backend, but I also want to publish the tool online. So when the API is finished I will ask you to create a simple frontend for the sitemap search tool.
Nested sitemap example:
https://www.example.com/sitemap1.xml.gz
https://www.example.com/sitemap2.xml.gz
Projects Completed
55
Freelancers worked with
41
Projects awarded
25%
Last project
20 May 2023
Netherlands
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-
There are no clarification messages.
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies