Python Expired Domain Script
- or -
Post a project like this£250(approx. $319)
- Posted:
- Proposals: 3
- Remote
- #419234
- Expired
Description
Experience Level: Expert
General information for the business: Expired Domain
Description of requirements/functionality: General Mode of Operation
There is a need for a script that scans the Internet for domains which are not being payed for by the owners after the end of a fixed time, and therefore, are not being extended by registrars. Such domains are called "expired domains."
The script is to start from a certain starting point (for example, a popular domain such as "www.bbc.com") and then, like a crawler, pursue all external links that are found on this popular site. The URLs that are found are to be written into a database. Simultaneously and on a permanent basis, this database is to be scanned by the script for expired domains. If the script finds an expired domain, then the URL to such domain is to be indicated for further manual examination via a web interface. At the same time, the script continues to pursue external links and incorporate newly found URLs into the database.
The goal is the incorporation into the database of as many as possible domains and their URLs. The more domains there are in the database, the more URLs can be scanned. The more URLs are scanned, the higher is the potential yield of expired domains.
Special Mode of Operation
1. Blacklist
So that, from the outset, only URLs that are as useful as possible are saved in the database, an automatic pre-selection is necessary. For this purpose, the script must follow all links that are found (in order to find additional URLs), but must not incorporate every URL into the database. This particularly applies to domains that have certain words in their URLs, such as, for example, words in the field of pornography ("sex," "fuck," etc.). A blacklist must be created for such domain names. If the script finds a domain, the domain name of which contains a word from the blacklist, then the script is to follow this link (in order to find other potential domains), but not writing into the database the relevant URL with the word from the blacklist.
2. Domains
In principle, as many as possible domains are to be incorporated into the database. In addition to domains that end in .com, .net or .org, this also includes those that end in, for example, .de, .ru, .tv, .cc or .co.uk. Thus, top-level domains (TLDs) and country-specific TLDs (country-code TLDs or ccTLDs) are to be incorporated into the database. Not to be incorporated are sub-domains or such domains that do not correspond to the criteria specified above.
3. Finding expired domains
The script must be able to find the domains in the database that are also actual expired domains. In addition, the script must monitor and scan the entire database on a sustained and permanent basis. All domains found in the database must be continuously examined according to sequence whether or not they are still active. In addition, the script domains should first briefly ping (for example, through an HTTP request). If a domain answers, then it is still active in any case. Therefore, it remains in the database until the next check and so forth.
If a domain does not answer back, then it is possibly no longer active. Upon such a "non-answer" of a domain, a WHOIS check must be immediately conducted. With this, one finds out whether the domain is really free or simply unavailable because, for example, the server is overloaded. If the domain is really free, an issuance of the domain takes place in a web interface.
4. WHOIS
WHOIS inquiries are not always trouble-free. For example, at the German registrar (DENIC), one can only survey 1000 domains before being blocked. In this case, solutions would have to be found to sidestep a block, whether by going through proxies for the survey, or in another way. In addition, with CNOBI domains (.com, .net, .org, .biz, .info), there is a so-called "redemption period." Such domains are actually free, but may not be newly registered for 60 days after release. At first sight, these domains appear to be able to be newly registered, but they are not. A solution would also have to be found here.
See attachment for full details
Description of requirements/functionality: General Mode of Operation
There is a need for a script that scans the Internet for domains which are not being payed for by the owners after the end of a fixed time, and therefore, are not being extended by registrars. Such domains are called "expired domains."
The script is to start from a certain starting point (for example, a popular domain such as "www.bbc.com") and then, like a crawler, pursue all external links that are found on this popular site. The URLs that are found are to be written into a database. Simultaneously and on a permanent basis, this database is to be scanned by the script for expired domains. If the script finds an expired domain, then the URL to such domain is to be indicated for further manual examination via a web interface. At the same time, the script continues to pursue external links and incorporate newly found URLs into the database.
The goal is the incorporation into the database of as many as possible domains and their URLs. The more domains there are in the database, the more URLs can be scanned. The more URLs are scanned, the higher is the potential yield of expired domains.
Special Mode of Operation
1. Blacklist
So that, from the outset, only URLs that are as useful as possible are saved in the database, an automatic pre-selection is necessary. For this purpose, the script must follow all links that are found (in order to find additional URLs), but must not incorporate every URL into the database. This particularly applies to domains that have certain words in their URLs, such as, for example, words in the field of pornography ("sex," "fuck," etc.). A blacklist must be created for such domain names. If the script finds a domain, the domain name of which contains a word from the blacklist, then the script is to follow this link (in order to find other potential domains), but not writing into the database the relevant URL with the word from the blacklist.
2. Domains
In principle, as many as possible domains are to be incorporated into the database. In addition to domains that end in .com, .net or .org, this also includes those that end in, for example, .de, .ru, .tv, .cc or .co.uk. Thus, top-level domains (TLDs) and country-specific TLDs (country-code TLDs or ccTLDs) are to be incorporated into the database. Not to be incorporated are sub-domains or such domains that do not correspond to the criteria specified above.
3. Finding expired domains
The script must be able to find the domains in the database that are also actual expired domains. In addition, the script must monitor and scan the entire database on a sustained and permanent basis. All domains found in the database must be continuously examined according to sequence whether or not they are still active. In addition, the script domains should first briefly ping (for example, through an HTTP request). If a domain answers, then it is still active in any case. Therefore, it remains in the database until the next check and so forth.
If a domain does not answer back, then it is possibly no longer active. Upon such a "non-answer" of a domain, a WHOIS check must be immediately conducted. With this, one finds out whether the domain is really free or simply unavailable because, for example, the server is overloaded. If the domain is really free, an issuance of the domain takes place in a web interface.
4. WHOIS
WHOIS inquiries are not always trouble-free. For example, at the German registrar (DENIC), one can only survey 1000 domains before being blocked. In this case, solutions would have to be found to sidestep a block, whether by going through proxies for the survey, or in another way. In addition, with CNOBI domains (.com, .net, .org, .biz, .info), there is a so-called "redemption period." Such domains are actually free, but may not be newly registered for 60 days after release. At first sight, these domains appear to be able to be newly registered, but they are not. A solution would also have to be found here.
See attachment for full details
Kevin W.
100% (4)Projects Completed
9
Freelancers worked with
9
Projects awarded
12%
Last project
7 Dec 2019
United Kingdom
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-
There are no clarification messages.
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies