Machine Learning Nummpy Array

Question

Dataset

The records are stored in a text file named “medical_records.data”. Each row corresponds to a patient record. The diagnosis is the attribute predicted. In this dataset, the diagnosis is the second field and is either B (benign) or M (malignant). There are 32 attributes in total (ID, diagnosis, and 30 real-valued input features

Attribute Information
1. ID number
2. Diagnosis (M = malignant, B = benign) 3-32
Ten real-valued features are computed for each cell nucleus:
a) radius (mean of distances from center to points on the perimeter)
b) texture (standard deviation of gray-scale values)
c) parameter
d) area
e) smoothness (local variation in radius lengths)
f) compactness (perimeter^2 / area - 1.0)
g) concavity (severity of concave portions of the contour)
h) concave points (number of concave portions of the contour)
i) symmetry
j) fractal dimension ("coastline approximation" - 1)
The mean, standard error, and "worst" or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features. For instance, field 3 is Mean Radius, field 13 is Radius SE, field 23 is Worst Radius. All feature values are recorded with four significant digits.

Your Tasks

Preprocessing Complete the function prepare_dataset that loads the data records from the text file, and converts the information to numpy arrays.
Build classifiers
Complete the build_*_classifier functions. There are four functions of this type to implement in the provided python file (nearest neighbours, decision trees,neural networks and support vector machines).
These classifiers have hyperparameters that affect the capacity/complexity of the classifier, you should use cross-validation to estimate the best value of one of these hyperparameters for each type of classifier. In this assignment, for the sake of time, we only consider one hyperparameter per classifier. You are free to choose the hyperparameter, but here are some suggestions;
  1. nearest neighbours → number of neighbours
  2. decision trees → maximum depth of the tree or minimum size of a leaf
  3. support vector machine → parameter C
  4. neural networks → number of neurons in the hidden layers

You have to split the whole dataset into a training, validation and testing sets. You should report the prediction errors on train_data as well as on validation_data and test_data. These errors are best reported in tables and figures. You are encouraged to use all the available functions of the sklearn and tensorflow/keras libraries.

Solution:
Code

1. Below structure is used to find the maximum dept of the tree and minimum size of the leaf.
def build_DecisionTree_classifier(X_training, y_training,x_test,y_test): ## "INSERT YOUR CODE HERE" maxAcc = 0 for depth in range(1,9): for leaf in range(1,20): clf_entr = DecisionTreeClassifier(criterion="entropy", random_state=100, max_depth=depth, min_samples_leaf=leaf) clf_entr.fit(X_training, y_training) y_pred = clf_entr.predict(x_test) #print(y_pred); Acc = accuracy_score(y_test, y_pred) * 100 if(Acc > maxAcc):

To Get Solution

Please put your valid email id

Plagiarism Checker

NEW YEAR OFFER 50% OFF !!! Order Now

NEW YEAR OFFER 50% OFF !!

Lets take Best opinion from our Expert Tutors Today! Order Now