PHP 7 Script to read sitemap.xml, check URL's for 40x and 30x + check resources
- or -
Post a project like this2775
$58
- Posted:
- Proposals: 1
- Remote
- #1274368
- PRE-FUNDED
- Awarded
Description
Experience Level: Intermediate
General information for the website: webstore
Num. of web pages/modules: 1
Description of every page/module: Rationale: I am looking for a PHP 7 Script to read sitemap.xml, check URL's for 40x and 30x + check local/remote resources on the page. So this is NOT a crawler. Just a resource checker. Results are then emailed.
Business reason: remove/fix all internal 40x and 30x. In sitemap. Plus resources locally hosted on pages. And as an option a href links.
PHP SCRIPT
- run from command line via cron / options via param/arg
- run from http (results on screen) / options via get paramss
- without options: reads local sitemap and check every page for 30x and 40x
- option --autodomain: inputs a domain name www.domain.com or domain.com and
a) tries to find the www.domain.com/robots.txt and parse it to find the sitemap.xml mentioned (example code available)
b) if no robots.txt found then tries www.domain.com/sitemap.xml
- option --alldomains: value 1 or true => in the case a robots file is found then it can contain more than 1 sitemap. If this option is true then all sitemaps are process and the result in the report is presented per domain
- option --sitemap: inputs http link or local direct link to sitemap
- option --checkres: for every page it checks it reads the contents, then check all resources like IMG (not a href) found on that page only for 30x and 40x
- option --checkhref: for every page it checks it reads the contents, then check all the ahrefs found on that page only for 30x and 40x
- option --40xonly: do the above for both res and href, but only check and report 40x
- option --30xonly: do the above for both res and href, but only check and report 30x
- option --all: check all above
- results are listed as
* sorted per status code
* list local page that is either in error 30x or 40x
* or list local page wiht an indent for the resource or ahref found on that page that is either in error 30x or 40x
- option --emailto:mail@mail.com : sends results to the e-mail address
- the above are options for the commandline
- the same options should be available for the web interface if we call the script via http: in this case a page is shown where we can enter the above params .. and hit enter to execute (step 1 is input, step 2 is execute)
- assume most methods exist in php like curl, xmlparse etc or check with us first
Results example
Sitemap Errors
301: domain.com/page1
302: domain.com/page12
404: domain.com/page24
etc
Resource Errors (img/embed/etc)
- domain.com/page1
301: domain.com/page1/myimage.png
404: remoteblog.com/page1/image66.png
- domain.com/page2
302: domain.com/page2/myimagexx.png
404: remoteblog.com/page1/image66.png
etc
Ahref/link Errors
- domain.com/page1
301: domain.com/page1/page2.html
404: remoteblog.com/page1/other page.html
- domain.com/page2
302: domain.com/page2/myimagexx.png
404: remoteblog.com/page1/image66.png
etc
Description of requirements/features: Job is for experience PHP only
Needs to run on php 7
Expert in CURL or WGET
Sitemap, XML parse, HTML parse
Extra notes:
Num. of web pages/modules: 1
Description of every page/module: Rationale: I am looking for a PHP 7 Script to read sitemap.xml, check URL's for 40x and 30x + check local/remote resources on the page. So this is NOT a crawler. Just a resource checker. Results are then emailed.
Business reason: remove/fix all internal 40x and 30x. In sitemap. Plus resources locally hosted on pages. And as an option a href links.
PHP SCRIPT
- run from command line via cron / options via param/arg
- run from http (results on screen) / options via get paramss
- without options: reads local sitemap and check every page for 30x and 40x
- option --autodomain: inputs a domain name www.domain.com or domain.com and
a) tries to find the www.domain.com/robots.txt and parse it to find the sitemap.xml mentioned (example code available)
b) if no robots.txt found then tries www.domain.com/sitemap.xml
- option --alldomains: value 1 or true => in the case a robots file is found then it can contain more than 1 sitemap. If this option is true then all sitemaps are process and the result in the report is presented per domain
- option --sitemap: inputs http link or local direct link to sitemap
- option --checkres: for every page it checks it reads the contents, then check all resources like IMG (not a href) found on that page only for 30x and 40x
- option --checkhref: for every page it checks it reads the contents, then check all the ahrefs found on that page only for 30x and 40x
- option --40xonly: do the above for both res and href, but only check and report 40x
- option --30xonly: do the above for both res and href, but only check and report 30x
- option --all: check all above
- results are listed as
* sorted per status code
* list local page that is either in error 30x or 40x
* or list local page wiht an indent for the resource or ahref found on that page that is either in error 30x or 40x
- option --emailto:mail@mail.com : sends results to the e-mail address
- the above are options for the commandline
- the same options should be available for the web interface if we call the script via http: in this case a page is shown where we can enter the above params .. and hit enter to execute (step 1 is input, step 2 is execute)
- assume most methods exist in php like curl, xmlparse etc or check with us first
Results example
Sitemap Errors
301: domain.com/page1
302: domain.com/page12
404: domain.com/page24
etc
Resource Errors (img/embed/etc)
- domain.com/page1
301: domain.com/page1/myimage.png
404: remoteblog.com/page1/image66.png
- domain.com/page2
302: domain.com/page2/myimagexx.png
404: remoteblog.com/page1/image66.png
etc
Ahref/link Errors
- domain.com/page1
301: domain.com/page1/page2.html
404: remoteblog.com/page1/other page.html
- domain.com/page2
302: domain.com/page2/myimagexx.png
404: remoteblog.com/page1/image66.png
etc
Description of requirements/features: Job is for experience PHP only
Needs to run on php 7
Expert in CURL or WGET
Sitemap, XML parse, HTML parse
Extra notes:
Mark B.
99% (36)Projects Completed
17
Freelancers worked with
15
Projects awarded
26%
Last project
25 Apr 2018
Netherlands
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-
There are no clarification messages.
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies