A Python script to load CSV data into PostGreSQL
- or -
Post a project like this2083
£100(approx. $124)
- Posted:
- Proposals: 8
- Remote
- #2071728
- Awarded
Description
Experience Level: Intermediate
The objective is to showcase a full ETL pipeline -- from a datasource to a data warehouse -- using python and SQL.
I would like to extract from the World Bank data related to the volume of remittances and migration between countries. In the end, I would like this data to be inside a PostGreSQL database and would like to answer these questions:
1. Top 10 country to country by number of migrants.
2. Top 10 country_to_country by volume of remittances.
3. Top 10 sending countries by number of migrants.
4. Top 10 receiving countries by number of migrants.
5. Top 10 Net senders by number of migrants.
6. Top 10 Net receivers by number of migrants.
I will provide the CSV data feeds.
Please use Python to prepare data for the database, and for any data manipulations inside the database use SQL (postgres).
Data source (csv) ----\
\ _________________________________________
Data source (csv) --------> Python ---------> PostgreSQL| staging --> SQL --> DWH |
/ ---------------------------------------------------------
Data source (csv) ----/
Proposed tables for the staging schema:
- country
- remittance
- migration
Proposed tables for the DWH schema:
- country
- corridor (from_country), to_country, remittance_value, migration_value)
NOTE: Please do NOT use Pandas, and NO python notebooks. Just one script.
Keep it simple and understandable. Comments accordingly within your code blocks. I also urge you NOT to pull all data into memory. A suggestion is to use generators to achieve this.
DELIVERABLES:
Python code (with annotations, and comments)
SQL code
Description of the implemented solution
Answers to the questions listed above (top 10 ...).
I would like to extract from the World Bank data related to the volume of remittances and migration between countries. In the end, I would like this data to be inside a PostGreSQL database and would like to answer these questions:
1. Top 10 country to country by number of migrants.
2. Top 10 country_to_country by volume of remittances.
3. Top 10 sending countries by number of migrants.
4. Top 10 receiving countries by number of migrants.
5. Top 10 Net senders by number of migrants.
6. Top 10 Net receivers by number of migrants.
I will provide the CSV data feeds.
Please use Python to prepare data for the database, and for any data manipulations inside the database use SQL (postgres).
Data source (csv) ----\
\ _________________________________________
Data source (csv) --------> Python ---------> PostgreSQL| staging --> SQL --> DWH |
/ ---------------------------------------------------------
Data source (csv) ----/
Proposed tables for the staging schema:
- country
- remittance
- migration
Proposed tables for the DWH schema:
- country
- corridor (from_country), to_country, remittance_value, migration_value)
NOTE: Please do NOT use Pandas, and NO python notebooks. Just one script.
Keep it simple and understandable. Comments accordingly within your code blocks. I also urge you NOT to pull all data into memory. A suggestion is to use generators to achieve this.
DELIVERABLES:
Python code (with annotations, and comments)
SQL code
Description of the implemented solution
Answers to the questions listed above (top 10 ...).
Danny T.
100% (2)Projects Completed
2
Freelancers worked with
2
Projects awarded
100%
Last project
16 Jul 2018
United Kingdom
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-
There are no clarification messages.
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies