**Question**

**Dataset**

The records are stored in a text file named “medical_records.data”. Each row corresponds to a patient record. The diagnosis is the attribute predicted. In this dataset, the diagnosis is the second field and is either B (benign) or M (malignant). There are 32 attributes in total (ID, diagnosis, and 30 real-valued input features

**Attribute Information**

1. ID number

2. Diagnosis (M = malignant, B = benign) 3-32

Ten real-valued features are computed for each cell nucleus:

a) radius (mean of distances from center to points on the perimeter)

b) texture (standard deviation of gray-scale values)

c) parameter

d) area

e) smoothness (local variation in radius lengths)

f) compactness (perimeter^2 / area - 1.0)

g) concavity (severity of concave portions of the contour)

h) concave points (number of concave portions of the contour)

i) symmetry

j) fractal dimension ("coastline approximation" - 1)

The mean, standard error, and "worst" or largest (mean of the three largest values) of these features were computed for each image, resulting in 30
features. For instance, field 3 is Mean Radius, field 13 is Radius SE, field 23 is Worst Radius. All feature values are recorded with four significant digits.

Complete the build_*_classifier functions. There are four functions of this type to implement in the provided python file (nearest neighbours, decision trees,neural networks and support vector machines).

These classifiers have hyperparameters that affect the capacity/complexity of the classifier, you should use cross-validation to estimate the best value of one of these hyperparameters for each type of classifier. In this assignment, for the sake of time, we only consider one hyperparameter per classifier. You are free to choose the hyperparameter, but here are some suggestions;

- nearest neighbours → number of neighbours
- decision trees → maximum depth of the tree or minimum size of a leaf
- support vector machine → parameter C
- neural networks → number of neurons in the hidden layers

You have to split the whole dataset into a training, validation and testing sets. You should report the prediction errors on train_data as well as on validation_data and test_data. These errors are best reported in tables and figures. You are encouraged to use all the available functions of the sklearn and tensorflow/keras libraries.

**Solution: **

**Code**

1. Below structure is used to find the maximum dept of the tree and minimum size of the leaf.

def build_DecisionTree_classifier(X_training, y_training,x_test,y_test):
**## "INSERT YOUR CODE HERE"**
maxAcc = 0
for depth in range(1,9):
for leaf in range(1,20):
clf_entr = DecisionTreeClassifier(criterion="entropy", random_state=100,
max_depth=depth, min_samples_leaf=leaf)
clf_entr.fit(X_training, y_training)
y_pred = clf_entr.predict(x_test)
#print(y_pred);
Acc = accuracy_score(y_test, y_pred) * 100
if(Acc > maxAcc):

NEW YEAR OFFER 50% OFF !!! Order Now

NEW YEAR OFFER 50% OFF !!

Lets take Best opinion from our Expert Tutors Today! Order Now

*Disclaimer: The reference papers provided by QuickAssignmentHelp.net serve as model papers for students and not to be submitted as it is. These papers are intended to used for research and reference purposes only.

Copyright © 2024 QuickAssignmentHelp.net All right reserved.