
Rapid Miner Data Traning and Testing
- or -
Post a project like this63
£50(approx. $69)
- Posted:
- Proposals: 16
- Remote
- #4443050
- Awarded
WordPress Developer | Custom Themes, Plugins & E-commerce Solutions,web scraping,Data Entry,Artificial intelligence
WordPress & custom web Developer ,Content writer, Data Analyst , QA specialist, python developer, Graphic Designer

I help businesses turn raw data into revenue using AI, ML, and predictive analytics
Data Scientist | Machine Learning & Deep Learning Expert | Time Series & Research Specialist
12275455128449071292714051582663915231853384107498305109220928748567849521115153911505914
Description
Experience Level: Entry
Hi I need some one to do the following:
Using the heart.csv data provided, construct an initial ‘default’ decision tree for classifying whether a patient has less chance, or more chance of having a heart attack, and summarise the performance of the model.
Improve the performance by changing the decision tree parameters. Suggest reasons for why the performance has improved, and explain the results.
Document your processes, showing step by step screens and outputs, including an explanation of the final parameters you used, and what effect they had on your model.
Description of the Heart dataset:
• Age : Age of the patient
• Sex : Sex of the patient
• exng: exercise induced angina (1 = yes; 0 = no)
• caa: number of major vessels coloured by flourosopy (0-3)
• cp : Chest Pain type chest pain type
o Value 1: typical angina
o Value 2: atypical angina
o Value 3: non-anginal pain
o Value 4: asymptomatic
• trtbps : resting blood pressure (in mm Hg)
• chol : cholestoral in mg/dl fetched via BMI sensor
• fbs : (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
• restecg : resting electrocardiographic results
o Value 0: normal
o Value 1: having ST-T wave abnormality
o Value 2: showing probable or definite left ventricular hypertrophy
• thalachh : maximum heart rate achieved
• oldpeak: ST depression induced by exercise relative to rest (-2.6 to 6.2)
• slp: the slope of the peak execise ST ssegment (0=upsloping, 1=flat, 2=Down)
• thall: 0=normal; 1=fixed defect; 2=reversable defect
• output (ie the label) : 0= less chance of heart attack 1= more chance of heart attack
2) k-Nearest Neighbour Classification
Using the DataSet_Scoring.csv and DataSet_Training.csv provided on BB.
First use DataSet_Training.csv to build and evaluate the k-NN algorithm
The dataset relates to pupils in a sports academy. Trainer has worked with athletes over the years and developed an extensive data set. He is now wondering can he use past performance of previous clients to predict prime sports for up-and-coming high school athletes. By evaluating each athlete’s performance across a range of tests, he hopes to be able to figure out for which sport each athlete has the highest aptitude.
Dataset consists of:
Age (in years)
Strength (ranking weight lifting exercises on a scale of 1-10, 10 being the highest
Quickness (performance in speed ‘buzzer’ tests, on a scale of 0-6, 6 being the fastest
Injury 0/1 1= serious injuries that took more than 3 weeks to heal
Vision: scores of 0-4 with 4 being perfect vision
Endurance: scale of 0-10, with 10 being best
Agility: scale of 0-100 with 100 being highest
Decision¬_Making: 3-100 (note the data has some erroneous data that needs to be filtered out using the ‘filter examples’ operator)
Prime_sport: which sport the athletes went on to specialist in (Football, Basketball, Baseball or hockey)
1) First use DataSet_Training.csv to build and evaluate the K-nn algorthm (note that data needs to be filtered to remove errors, as explained above!)
2) What is the accuracy of the model?
3) Next remove the split validation operator, bring the k-means operator to the main pane, read in the second file (DateSet_Scoring.csv), filter it for errors and connect it to the ‘apply model’ operator to generate k-NN model predictions and confidences
4) What is the confidence level for each prediction?
5) What happens when you change k to 2? Or to 3?
i need this by tomorrow after noon befor 3pm London Time
Using the heart.csv data provided, construct an initial ‘default’ decision tree for classifying whether a patient has less chance, or more chance of having a heart attack, and summarise the performance of the model.
Improve the performance by changing the decision tree parameters. Suggest reasons for why the performance has improved, and explain the results.
Document your processes, showing step by step screens and outputs, including an explanation of the final parameters you used, and what effect they had on your model.
Description of the Heart dataset:
• Age : Age of the patient
• Sex : Sex of the patient
• exng: exercise induced angina (1 = yes; 0 = no)
• caa: number of major vessels coloured by flourosopy (0-3)
• cp : Chest Pain type chest pain type
o Value 1: typical angina
o Value 2: atypical angina
o Value 3: non-anginal pain
o Value 4: asymptomatic
• trtbps : resting blood pressure (in mm Hg)
• chol : cholestoral in mg/dl fetched via BMI sensor
• fbs : (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
• restecg : resting electrocardiographic results
o Value 0: normal
o Value 1: having ST-T wave abnormality
o Value 2: showing probable or definite left ventricular hypertrophy
• thalachh : maximum heart rate achieved
• oldpeak: ST depression induced by exercise relative to rest (-2.6 to 6.2)
• slp: the slope of the peak execise ST ssegment (0=upsloping, 1=flat, 2=Down)
• thall: 0=normal; 1=fixed defect; 2=reversable defect
• output (ie the label) : 0= less chance of heart attack 1= more chance of heart attack
2) k-Nearest Neighbour Classification
Using the DataSet_Scoring.csv and DataSet_Training.csv provided on BB.
First use DataSet_Training.csv to build and evaluate the k-NN algorithm
The dataset relates to pupils in a sports academy. Trainer has worked with athletes over the years and developed an extensive data set. He is now wondering can he use past performance of previous clients to predict prime sports for up-and-coming high school athletes. By evaluating each athlete’s performance across a range of tests, he hopes to be able to figure out for which sport each athlete has the highest aptitude.
Dataset consists of:
Age (in years)
Strength (ranking weight lifting exercises on a scale of 1-10, 10 being the highest
Quickness (performance in speed ‘buzzer’ tests, on a scale of 0-6, 6 being the fastest
Injury 0/1 1= serious injuries that took more than 3 weeks to heal
Vision: scores of 0-4 with 4 being perfect vision
Endurance: scale of 0-10, with 10 being best
Agility: scale of 0-100 with 100 being highest
Decision¬_Making: 3-100 (note the data has some erroneous data that needs to be filtered out using the ‘filter examples’ operator)
Prime_sport: which sport the athletes went on to specialist in (Football, Basketball, Baseball or hockey)
1) First use DataSet_Training.csv to build and evaluate the K-nn algorthm (note that data needs to be filtered to remove errors, as explained above!)
2) What is the accuracy of the model?
3) Next remove the split validation operator, bring the k-means operator to the main pane, read in the second file (DateSet_Scoring.csv), filter it for errors and connect it to the ‘apply model’ operator to generate k-NN model predictions and confidences
4) What is the confidence level for each prediction?
5) What happens when you change k to 2? Or to 3?
i need this by tomorrow after noon befor 3pm London Time
AOTM ePOS L.
0% (0)Projects Completed
1
Freelancers worked with
1
Projects awarded
100%
Last project
14 Dec 2025
United Kingdom
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-
There are no clarification messages.
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies