Parsing XLS file scripting data modelling
- or -
Post a project like this1435
€40(approx. $43)
- Posted:
- Proposals: 6
- Remote
- #2804873
- Awarded
Excel and VBA expert. Professional OCR. PDF to Excel Conversion. Data management, Translations English Russian
Hua Hin
110463021476402385436389037139717903999903
Description
Experience Level: Entry
Estimated project duration: 1 day or less
Parsing XLS file scripting data modelling
This job is ideal for an coding expert of scripting/programmer and data modelling.
This job is about parsing 3 Excel files by following strict instructions, providing intermediate working files and reporting accordingly.
You need to deliver this work in 24 hours and will be paid only if the deliverable are controllable and consistent.
In order to be assigned the job, please provide a sample for the final file (version 1, 2 and 3) for one country (e.g. DE) for all 3 files.
INSTRUCTIONS
You have 3 files whose file names are:
Epale to parse
Erasmus+ to parse
Etwinning to parse
1) Country abbreviations
For each file FILE_NAME:
map the country to the abbreviations listed in https://www.gesis.org/fileadmin/upload/dienstleistung/daten/soz_indikatoren/eusi/List_of_Country_Abbreviations.pdf
and add the column country abbreviation.
Call the file FILE_NAME_1
2) Split them by comma or other separator
For each file:
For each row:
If the column email includes a number n of "," or other separator (eg ";"), replace the row with n+1 rows each with the same row country and a single email per row.
Example:
NL info@trinitascollege.nl,infohf@trinitascollege.nl,infojb@trinitascollege.nl
becomes
NL info@trinitascollege.nl
NL infohf@trinitascollege.nl
NL infojb@trinitascollege.nl
Call the file FILE_NAME_2
3) Deduplicate
For each file:
deduplicate records.
Call the file FILE_NAME_3
4) Merge:
Add to each file a column called "Source" and fill it in with the file name.
Example
The record from the Erasmus+:
NL info@trinitascollege.nl
becomes
NL info@trinitascollege.nl Erasmus+
Create a new file "Final file"
For each file:
Add each record to the new file.
If the email of the record already exists in the new file, add the name of the source file separated by a "," in the Source column.
Example
The record :
TR info@adalet.gov.tr
exists both in Epale and Erasmus+
So it will become
TR info@adalet.gov.tr Epale, Erasmus+
5) Invalid
Add a column Invalid and mark it with an X if the email has an invalid synthax, e.g. two "@" instead of one or no extension
6) Anomaly
Add a column Anomaly and mark it with an X if the email extension does not belong to the country (exclude .eu, .com, .org., .net)
Call the file FINAL_1
7) Divide by sheets 19+others
Split records by sheet in the same file for the following countries
DE
PL
ES
SE
FI
UK
FR
DK
CZ
IT
AT
NO
HU
SK
LV
NL
HR
LT
RO
Place all other countries in a sheet called "Others"
Call the file FINAL_2
8) Divide by sheets 12+others
DE
PL
ES
SE
FI
UK
FR
DK
CZ
IT
AT
NO
Place all other countries in a sheet called "Others"
Call the file FINAL_3
9) Produce a report to count:
- number of emails by country and file name at step 2
- number of emails by country and file name at step 3
- number of emails by country at step 4
- number of invalid email (step 5) and anomalies (step 6)
- number of records by country in FINAL_1 , 2 and 3
10) Deliverables
Epale to parse 1 , 2 and 3
Erasmus+ to parse 1 , 2 and 3
Etwinning to parse 1 , 2 and 3
Final file 1, 2 and 3
Report
This job is ideal for an coding expert of scripting/programmer and data modelling.
This job is about parsing 3 Excel files by following strict instructions, providing intermediate working files and reporting accordingly.
You need to deliver this work in 24 hours and will be paid only if the deliverable are controllable and consistent.
In order to be assigned the job, please provide a sample for the final file (version 1, 2 and 3) for one country (e.g. DE) for all 3 files.
INSTRUCTIONS
You have 3 files whose file names are:
Epale to parse
Erasmus+ to parse
Etwinning to parse
1) Country abbreviations
For each file FILE_NAME:
map the country to the abbreviations listed in https://www.gesis.org/fileadmin/upload/dienstleistung/daten/soz_indikatoren/eusi/List_of_Country_Abbreviations.pdf
and add the column country abbreviation.
Call the file FILE_NAME_1
2) Split them by comma or other separator
For each file:
For each row:
If the column email includes a number n of "," or other separator (eg ";"), replace the row with n+1 rows each with the same row country and a single email per row.
Example:
NL info@trinitascollege.nl,infohf@trinitascollege.nl,infojb@trinitascollege.nl
becomes
NL info@trinitascollege.nl
NL infohf@trinitascollege.nl
NL infojb@trinitascollege.nl
Call the file FILE_NAME_2
3) Deduplicate
For each file:
deduplicate records.
Call the file FILE_NAME_3
4) Merge:
Add to each file a column called "Source" and fill it in with the file name.
Example
The record from the Erasmus+:
NL info@trinitascollege.nl
becomes
NL info@trinitascollege.nl Erasmus+
Create a new file "Final file"
For each file:
Add each record to the new file.
If the email of the record already exists in the new file, add the name of the source file separated by a "," in the Source column.
Example
The record :
TR info@adalet.gov.tr
exists both in Epale and Erasmus+
So it will become
TR info@adalet.gov.tr Epale, Erasmus+
5) Invalid
Add a column Invalid and mark it with an X if the email has an invalid synthax, e.g. two "@" instead of one or no extension
6) Anomaly
Add a column Anomaly and mark it with an X if the email extension does not belong to the country (exclude .eu, .com, .org., .net)
Call the file FINAL_1
7) Divide by sheets 19+others
Split records by sheet in the same file for the following countries
DE
PL
ES
SE
FI
UK
FR
DK
CZ
IT
AT
NO
HU
SK
LV
NL
HR
LT
RO
Place all other countries in a sheet called "Others"
Call the file FINAL_2
8) Divide by sheets 12+others
DE
PL
ES
SE
FI
UK
FR
DK
CZ
IT
AT
NO
Place all other countries in a sheet called "Others"
Call the file FINAL_3
9) Produce a report to count:
- number of emails by country and file name at step 2
- number of emails by country and file name at step 3
- number of emails by country at step 4
- number of invalid email (step 5) and anomalies (step 6)
- number of records by country in FINAL_1 , 2 and 3
10) Deliverables
Epale to parse 1 , 2 and 3
Erasmus+ to parse 1 , 2 and 3
Etwinning to parse 1 , 2 and 3
Final file 1, 2 and 3
Report
Francesco C.
99% (70)Projects Completed
70
Freelancers worked with
22
Projects awarded
62%
Last project
26 Jan 2024
Italy
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-
can we do this my Ms Power Query?
912925
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies