Data extraction / scraping from site, manipulation and import into Salesforce.com

- or -

Post a project like this

Ended at: 07/03/2019

Fixed Price

£240(approx. $301)

Posted: 5 years ago
Proposals: 8
Remote
#2293634
OPPORTUNITY
Expired

+ have already sent a proposal.

Description

Experience Level: Expert

Estimated project duration: less than 1 week

This project is to extract and clean data from a legacy website and import it into Salesforce.com. An extractor will need to be built, and you must be able to manipulate large volumes of data (approx. 2.5 million rows).

Please read through all of the information below and review the example documents provided before responding.

The steps would be:
• Build a bot to extract data from legacy site. We estimate there is approx. 2.5 million rows of data on the legacy site
• Clean and format data, ensuring all formatting standards for Salesforce data imports are followed
• Use unique identifier (member number) to find new system record identifier (Salesforce record id)
• QA data by cross checking that ‘name’ and ‘email’ in the new system matches the ‘name’ extracted from the legacy system when using the unique identifier as a reference. Indicate an error if either does not match.
• Run a formula to return the correct ‘Diary record’ value from the inputs that will be provided
• Batch clean and formatted data into csv files of 20,000 rows each that are ready for import into Salesforce- these will need to be provided
• Create ‘diary’ records- any members with activities pre June 2017 will need a diary record created in Salesforce
• Create activity records in Salesforce, associating them the correct diary (more information in Excel spreadsheet)

The output csvs should follow formatting standards for Bulk API processing in the Salesforce dataloader. Examples of guidance from Salesforce on formatting (please not these are not exhaustive):

• https://developer.salesforce.com/docs/atlas.en-us.198.0.api_asynch.meta/api_asynch/datafiles_csv_valid_record_rows.htm

• https://help.salesforce.com/articleView?id=supported_data_types.htm&type=5

• All data must be trimmed and bad characters removed

Based on my assessment of the site, the extraction bot will probably need to do the below- I have attached a PDF with each screen and what actions would be required:
• Go to http://cpd.rcslt.org/admin/list_users.php?search=1
• Input credentials
• Click Login
• From CSV dataset, input into search: Salesforce RCSLT database member number (Loop through each row of this dataset for the following sets). There are approx. 30k member numbers, and not all are expected to return results.
• Click Search
• If member is not found in legacy database: record an error as ‘Member not found’ at this step
• If member is found in legacy database: Page returned will display between 1-3 links to separate 'Learning Diaries', each navigating to detail page(s) where the data sits.
o Loop through each ‘Learning diary link’
o If no diaries are found under the account, record error ‘No diaries found’ at this step
o If there are multiple pages to the diary, links to all pages are listed at the header and footer page (there is no forward/back pagination)
 From the first page (which may be the only page), extract ‘special circumstances’
 For all diaries, loop through each page and
• Export the ‘activities’ table
• Append membership number to each row

Please reference ‘New CPD Diary’ in your proposal to confirm that you have read through this information.

Example input and outputs are attached.

New Proposal

Clarification Board Ask a Question

06 Feb 2019

Hello Siobhan,
Do you have direct access to the legacy database?
Is your budget flexible?
Regards,
M. C.
05 Feb 2019

Hi Siobhan

The web portal must use a database in the back-end to store all the data that you want to migrate. Do you not have access to the source?

Regards
Ian
Constructive Force Ltd