
Excel routine to compare address data sets and score for similarity
5811
$$
- Posted:
- Proposals: 1
- Remote
- #24526
- Archived
Description
Experience Level: Intermediate
This job is to create a Microsoft Excel page (Mac for Excel 2004 compatible) - or web-based database application - that will compare two sets of data and score individual lines from one set of records for matches with data from the second set.
The first data set (Master file) is a file containing about 8,000 names and addresses with the following seven fields: Name, four Address fields, County, Full UK Postcode.
The second data set (Reference file) is a file of about 4,000 records with two fields: Name and Part UK Postcode (the first half of the postcode, typically three or four characters).
Rather than manually compare the two lists, I would like a smart Excel solution to find and score matches between the two data sets.
I don't know if this is even possible, but the the ideal solution will result in a data set that includes the seven data fields and an additional field that provides a match score, possibly as a percentage.
As a starting point the, the two postcode fields should match (the Master postcode must contain the first part of the Postcode in the Reference file). If the Postcode matches, the name fields should be compared for similarity and a match score calculated (100% for a direct hit).
I will attach sample Master and Reference files (tab delimited format) that may help explain the task at hand.
The first data set (Master file) is a file containing about 8,000 names and addresses with the following seven fields: Name, four Address fields, County, Full UK Postcode.
The second data set (Reference file) is a file of about 4,000 records with two fields: Name and Part UK Postcode (the first half of the postcode, typically three or four characters).
Rather than manually compare the two lists, I would like a smart Excel solution to find and score matches between the two data sets.
I don't know if this is even possible, but the the ideal solution will result in a data set that includes the seven data fields and an additional field that provides a match score, possibly as a percentage.
As a starting point the, the two postcode fields should match (the Master postcode must contain the first part of the Postcode in the Reference file). If the Postcode matches, the name fields should be compared for similarity and a match score calculated (100% for a direct hit).
I will attach sample Master and Reference files (tab delimited format) that may help explain the task at hand.
Graeme K.
100% (39)Projects Completed
36
Freelancers worked with
29
Projects awarded
72%
Last project
11 Aug 2024
United Kingdom
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-
There are no clarification messages.
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies