The diagnosis and segmentation of tumors using any medical diagnostic tool can be challenging due to the varying nature of this pathology. Magnetic Resonance Imaging (MRI) is an established diagnostic tool for various diseases and disorders, and plays a major role in clinical neuro-diagnosis. Supplementing this technique with automated classification and segmentation tools is gaining importance, to reduce errors and time needed to make a conclusive diagnosis. In this paper a simple three step algorithm is proposed;
(1) identification of patients that present with tumors,
(2) automatic selection of abnormal slices of the patients
(3) segmentation and detection of tumor.
Features were extracted by using discrete wavelet transform on the normalized images, and classified by support vector machine (for step (1)) and random forest (for step (2)). The 400 subjects were divided in a 3:1 ratio between training and test with no overlap.
The human brain is formidably complex entailing a host of factors such as age, gender, ethnicity and personal medical history. Diagnosis of brain abnormalities such as degenerative, infectious, ischemic or malignant are done using the Magnetic Resonance Imaging (MRI) which is an effective standardized neuro-imaging tool. A routine brain imaging protocol includes T1-weighted, T2- weighted, Fluid attenuated inversion recovery (FLAIR), Gadolinium-enhanced T1-weighted images. The mode of data acquisition is gradually shifting from two-dimensional to three-dimensional imaging. This results in a large volume of data per patient, for which the analysis is both time consuming and prone to error. This makes computer-aided detection desirable as an aide to the radiologist.
Tumors are atypical cells multiplying out of control. These may vary in size, location and type. They show a spectrum of atypia from benign to malignant. It is usually variegated with high grade and low grade tumor cells, necrosis and edema. Therefore, it is daunting to train a computational system to identify and segment the region of interest, making it the pathology of interest.
Various methods are used for automated disease classification and tumor segmentation, each with their own restrictions. The pathologies that are more commonly studied for classification purposes are degenerative diseases like Alzheimer’s [1, 2, 3, 4], Parkinson’s and Schizophrenia . These affect the entire brain posing less of a challenge as far as classification is concerned as the effected brain varies significantly from physiology. The studies are primarily limited by a small dataset [6, 7, 8, 9, 10] mostly taken from medical libraries that are available on the internet, like the Harvard School Medical Library [11, 12, 13, 14, 15]. Moreover, the data sets often lack header information. Mostly, the images used for classification and segmentation are taken as one slice per patient from a two dimensional scan set [16, 10, 15]. However, the lesion usually does not appear only in one slice and hence limits the training and test data to certain slices of the brain, making it difficult to use in a clinical scenario.
Methods like Multi-Geometric Analysis (MGA)  and entropy based features using discrete wavelet transform , have been used for feature extraction. Detailed extraction methodology and normalization techique used for images is not discussed in most of the studies. Principal Component Analysis (PCA) is a commonly used unsupervised learning method for feature selection and reduction which increases the computation time and can eliminate certain data that is essential for classification.
Most studies use PCA without justification or mention of the computational time [15, 10, 17, 18]. Many classification techniques such as support vector machine (SVM) [15, 1, 3, 13, 19, 2, 20, 21, 4], Artificial neural network (ANN) [16, 11, 17], Probabilistic neural network (PNN) , Linear discriminant analysis (LDA) , k-Nearest neighbour (k-NN)  have been employed based on their own merits. Template comparison to a normal atlas of the human brain has also been used for classification , which would be limited to certain data and may not be able to correctly classify variants from normal physiology. The patient data under analysis is linear in nature and therefore, linear classifiers like SVM and random forest are useful. The success of the classification techniques is dependent on the feature set available and pre-processing of images used.
Convolutional neural network (CNN) [24, 25, 26], k-NN, Fuzzy networks  and other feature based techniques are used for segmentation which requires a large data base usually acquired from Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) 2013. Methods like fuzzy c-means algorithm produce multiple segmented images depending on the number of clusters chosen and thus the final segmented image must be manually selected. Other methods use a normal template which may vary in intensity, could be a normal variant and may not be very accurate for tumor segmentation purposes.
This work has been divided into three sections to systemically segment tu- mors. The first section filters patients with tumors (group1) in contrast to patients without tumors (group2), the second further filters group1 into the slices with tumors, and the third segments the tumor and stores the 3D data with the marked region of interest. The process flow for the proposed methodology is shown in figure 1.
2.1. Data used
Data was collected from a single setup with all imaging performed on a 3-Tesla MR scanner (Philips Healthcare, Netherlands) under the expert super-vision of a neuroradiologist (R.K.G.) with more than 30 years of experience.
Test subjects were classified into normal, referred to as group2 (n=200) and with findings, referred to as group1 (n=200). Training and testing groups were randomly selected from each group in the ratio of 3:1, respectively and without duplication.
Age of group1 ranged from 2 to 82 years with a mean of 43.64± 18.41 years; male:female ratio of 1.82. Among these, the training group had a male:female ratio of 1.68 with a mean age of 32.66 ± 18.01 and test group had a male:female ratio of 2.33 with a mean age of 43.92 ± 21.44. Age of group2 ranged from 4 to 71 years with a mean of 35.36± 14.44 years; male:female ratio was 1.27. Among these, the training group had a male:female ratio of 1.17 with a mean age of 27.01± 14.71 and test group had a male:female ratio of 1.63 with a mean age of 33.42± 16. Tumors varied in size, shape, location, tumor grade and multiplicity.
Volumetric T2-FLAIR, and volumetric T2-weighted images (or 2D) were taken. T2-weighted MR images use long TE (echo time) and TR (relaxation time) to resolve water from fat; the former has longer time span for both. FLAIR is a T2-based pulse sequence which nullifies fluid and displays the pathology prominently .
All images were normalized to 64 x 64 matrix of 10 mm slice thickness yielding a homogeneous element of 16 slices.
2.2. Image Post-processing
For all the three steps the image post-processing steps followed were the same and are described in this section.
The patient data, either volumetric or 2-dimensional, was normalized using SPM8 toolbox in MATLAB using the respective T2.nii template. The size taken for normalization was [2,2,10] which indicates that the image size for each slice is 64x64 with 10 mm slice gap, thus generating a total of 16 slices per patient. The volumetric normalization was done as per the equations 1 to 4. Volumetric normalization maximizes the overlap of voxels of the images being processed X and the template X0 as seen in equation 1, where T is a rigid body transformation.
If F (X0 ) is the set of intensities for the overlapped voxels in X0 with mean f ¯ and G(X0) is the set of intensities for the overlapped voxels of X with mean g¯ the normalized correlation coefficient (NCC) is given in equation 2.
The first and last 2 slices are discarded as they contain very little useful information. Intensity normalization was performed for each slice of each patient, where the intensity range is changed to 0-1 by dividing the value of each pixel by the highest pixel value of the slice as seen in equation 5 where x is the pixel value and i and j represent row and column of the image respectively. Then the data was compiled to yield a homogeneous element of 12 slices for further processing and classification.
2.2.2. Feature Extraction
Each slice was divided by a 2x2 grid and the mean of the voxel values in the grid was considered as the feature of the grid as shown in equation 6. Therefore, 32x32 (64x64/2x2) features were obtained per slice and 32x32x12 features were obtained for each patient.
2.3. Patient classification by Support Vector Machine
Support Vector Machines (SVMs) are based on the concept of decision planes that define boundaries between different classes of objects. A decision plane is one that separates a set of objects having different class memberships. There are different types of decision planes like Linear, Quadratic and Polynomial which fit data into different classes for classification [28, 2, 29, 30, 31, 32, 33, 34]. For the linear kernel, the equations for solving the hyper plane equations are as given in equations 7 and 8.
Where, w is a weight vector, x is the input vector and b is a bias. d is the margin of separation between the hyper plane and the closest data point for a given weight w and bias b. Optimal decision plane is the one which maximizes the margin of separation d.
2.4. Slice Identification
Group 1 patients were processed at this stage to identify the slices of interest with each slice considered as a separate feature set. The number of rows are equal to the product of number of patients and slices (12) and columns are equal to the square of 32, for the resulting feature matrix .
2.4.1. Random Forest Classifier
Random forests is a learning method for classification, regression and other tasks. It operates by constructing a multitude of decision trees for training and gives the class as the output. It is a combination of multiple classification or regression trees combined to improve the accuracy of training and therefore classification [35, 36, 37]. The slices that are used differ from each other significantly thus linear SVM was insufficient for this classification. Random forest with 25 trees was used to obtain a high sensitivity.
2.5. Tumor Segmentation
With the abnormal slices known for each patient the data is processed for tumor segmentation. The data used is the normalized T2 data which has undergone intensity normalization and has been re-sized to 64 x 64 for ease of processing.
2.5.1. Discrete Wavelet Transform (DWT)
DWT helps visualize images in the time-frequency domain using low and high pass filters to decompose it. The function in its discrete form is given by the following equations:
l(n) and h(n) are the low and high pass filters, respectively, and cAj,k and cDj,k represent the approximation components containing the low frequency information and detailed components containing the high frequency details of the image which basically are the edge of the image k-space [38, 39, 40, 41, 14]. Tumor segmentation requires an approximation component image as a base contrast of the tumor. Eliminating the high frequency components removes the edges including skull patterns which might cause errors in segmentation. The approximation image is re-sized to 64x64 and used for tumor segmentation.
2.5.2. Thresholding by Contralateral Comparison
The Vertical Symmetry or the Contralateral approach is dependent on the fact that the bilateral cerebral hemispheres are comparable. The presence of a tumor distorts the symmetry of the brain, and hence this method is appropriate. Only the slices with tumors as classified by the Random Forest classifier are taken and analyzed further for tumor segmentation. The steps for segmentation are as follows and are shown in figure 2:
1. If any slice demonstrates tumor, the neighboring 3 slices are analyzed to locate tumor margins and hence verify the prediction.
2. Next all stray tumor slices are removed i.e those slices with no adjoining slices, and therefore, removing the confounding features.
3. Then the remaining slices are made continuous i.e. for example if slices 6-7 and 11-12 were found to qualify as above, then all slices from 6 to 12 are considered for segmentation.
4. To segment the tumor (voxels identified as tv), the right and left half of the brain are compared to find the points of intensity difference above a threshold as shown in equation 11. This is advantageous as the contralateral side serves as the control and training set is not required.
5. A 4x4 section is created around the selected points and if, the number of points in the patch is less than a threshold, the patch is removed. This is done assuming that pathology shall be larger than the patch size considered, and our methodology removes smaller asymmetries which qualify for normal variants.
6. The remaining sections that are considered to be tumors are delineated.
3.1. Patient Classification
In the first step, all patients were classified into group1 and group2 using SVM. The accuracy, sensitivity and specificity were calculated as per equations 12, 13 and 14 [17, 11, 42, 43]. Testing with T2WI and FLAIR yielded accuracy, sensitivity and specificity of 92.00%, 90.00%, 94.00% and 88.78%, 84.91%, 92.60%, respectively.
Group2 patients were subjected to the next step. The algorithm was run on MATLAB R2014a on a 4GB RAM, 2.3GHz i5 processor and took 97 seconds to process the algorithm.
3.2. Slice Selection
Testing with T2WI yielded sensitivity of 77.52%. The algorithm outed the slice numbers that are abnormal. This result acts as the input to the next stage.
3.3. Tumor Segmentation
These slices are thereafter, presented to the next step where they undergo segmentation and the output is a nifti file of the T2 weighted images which include a white line which demarcates the tumor region, as exemplified in figure 3. The false positive from the first stage do not have any tumor regions segmented, thus increasing the specificity of the overall algorithm to 100% and accuracy to 95%. Figure 3 shows an example of the corresponding slice for a normal patient, patient with tumor and the post-segmentation image of the respective patient.
Automated classification of abnormal images and tumor segmentation is no longer a pre-clinical research tool. Our algorithm demonstrates that it can be applied to clinical data. To be applicable to clinical data all slices must be included for classification with a standardized normalization procedure. As compared to work done in previous literature, in the present approach, a combination of classification and tumor segmentation is used to achieve a higher accuracy in terms of identifying patients with tumors and identifying the slice and area in which the tumor occurs [18, 44, 22, 45, 46].
A multi-variate data set of 400 patients (T2-weighted images) are used where all the images taken in the scan are considered unlike most studies which are limited by data [18, 22, 44, 45, 46, 47, 12] and use only one image per patient [18, 22]. Each patient set is treated as an image thus increasing accuracy and reliability. If only one slice is given as training, it implies that a diagnostician would be required to supply the tumor image for classification, thus defeating the classification purpose.
The entire data taken from a scan is fed into the program to give a result of pathology or physiology as well as slice and area in which the tumor appears. The data is taken from a clinical setup and the parameters of the scanner and protocols are known. The images are normalized and then intensity normalization is done to ensure that all images are scaled. The details of normalization, sequence protocol used, details of feature extraction and size of feature matrix are limited in past studies [48, 47, 12].
In past, many groups have attempted to automate the classification and segmentation of brain tumor using different MR image sequences. However, the reported results have restrictions in terms of images used and use of training sets or templates for segmentation. In this work, a combined model has been proposed for both classification and segmentation subsequently to obtain a higher accuracy. The data set is large, varied and the training and testing set do not overlap. The algorithm is comprehensive and effective within a short computation time.
It is a three step process which involves identifying patients with tumor, then extracting the abnormal slices followed by segmentation of the tumor. The entire patient data set is used for classification treating each scan set as a single image comprising of 12 slices. Passing all slices considered as abnormal through segmentation allows for normal patients that have been misclassified to be correctly classified in the third step. Approximation components of the original images are mapped contralaterally for tumor segmentation which unlike previous studies does not require a normal template or a training set. The overall accuracy of the method proposed is 95% with 100% specificity and 90% sensitivity. Future work could include improving accuracy and segmenting the various cell aggregates that differ in composition, within the tumor.
 B. Magnin, L. Mesrob, S. Kinkingnhun, M. Plgrini-Issac, O. Colliot, M. Sarazin, B. Dubois, S. Lehricy, H. Benali, Support vector machine-based classification of alzheimers disease from whole-brain anatomical MRI, Neuroradiology 51 (2009) 73–83.
 A. Ortiz, J. M. Grriz, J. Ramrez, F. Martnez-Murcia, LVQ-SVM based CAD tool applied to structural MRI for the diagnosis of the alzheimers disease, Pattern Recognition Letters 34 (2013) 17251733.
 S. Kloppel, C. M. Stonnington, C. Chu, B. Draganski, R. I. Scahill, J. D. Rohrer, N. C. Fox, C. R. J. Jr, J. Ashburner, R. S. J. Frackowiak, Automatic classification of MR scans in alzheimers disease, Brain 131 (2008) 681–689.
 M. Liu, D. Zhang, E. Adeli-Mosabbeb, D. Shen, Inherent structure based multi-view learning with multi-template feature representation for alzheimers disease diagnosis.
 M. Nieuwenhuis, N. E. van Haren, H. E. H. Pol, W. Cah, R. S. Kahn, H. G. Schnack, Classification of schizophrenia patients and healthy controls from structural MRI scans in two large independent samples, NeuroImage 61 (2012) 606–612.
 Y. Sun, B. Bhanu, S. Bhanu, Automatic symmetry-integrated brain injury detection in MRI sequences, in: Computer Vision and Pattern Recognition Workshops, 2009. CVPR Workshops 2009. IEEE Computer Society Conference on, IEEE, 2009, pp. 79–86.
 M. Prastawa, E. Bullitt, S. Ho, G. Gerig, A brain tumor segmentation framework based on outlier detection, Medical image analysis 8 (3) (2004) 275–283.
 E. Abdel-Maksoud, M. Elmogy, R. Al-Awadi, Brain tumor segmentation based on a hybrid clustering technique, Egyptian Informatics Journal 16 (1) (2015) 71–81.
 M. B. Cuadra, C. Pollo, A. Bardera, O. Cuisenaire, J.-G. Villemure, J.-P. Thiran, Atlas-based segmentation of pathological MR brain images using a model of lesion growth, Medical Imaging, IEEE Transactions on 23 (10) (2004) 1301–1314.
 V. Rathi, S. Palani, Brain tumor MRI image classification with feature selection and extraction using linear discriminant analysis, arXiv preprint arXiv:1208.2128.
 E.-S. A. El-Dahshan, T. Hosny, A.-B. M. Salem, Hybrid intelligent techniques for MRI brain images classification, Digital Signal Processing 20 (2010) 433–441.
 N. Zhang, S. Ruan, S. Lebonvallet, Q. Liao, Y. Zhu, Kernel feature selectionto fuse multi-spectral MRI images for brain tumor segmentation, Computer Vision and Image Understanding 115 (2) (2011) 256–269.
 S. C. ana L.M. Patnaik, N. Jagannathan, Classification of magnetic resonance brain images using wavelets as input to support vector machine and neural network, Biomedical Signal Processing and Control 1 (2006) 86–92.
 M. Saritha, K. P. Joseph, A. T. Mathew, Classification of MRI brain images using combined wavelet entropy based spider web plots and probabilistic neural network, Pattern Recognition Letters 34 (2013) 2151–2156.
 S. Das, M. Chowdhury, M. K. Kundu, Brain MR image classification using multi-scale geometric analysis of ripplet, Progress In Electromagnetics Research 137 (2013) 1–17.
 W. H. Ibrahim, A. A. A. Osman, Y. I. Mohamed, MRI brain image classification using neural networks, in: Computing, Electrical and Electronics Engineering (ICCEEE), 2013 International Conference on, IEEE, 2013, pp. 253–258.
 N. H. Rajini, R. Bhavani, Classification of MRI brain images using k nearest neighbor and artificial neural network, in: IEEE-ICRTIT, 2011.
 M. F. Othman, M. A. M. Basri, Probabilistic neural network for brain tumor classification, in: Intelligent Systems, Modelling and Simulation (ISMS), 2011 Second International Conference on, IEEE, 2011, pp. 136– 138.
 N. K. Focke, M. Yogarajah, M. R. Symms, O. Gruber, W. Paulus, J. S. Duncan, Automated MR image classification in temporal lobe epilepsy, NeuroImage 59 (2012) 356–362.
 K. Machhale, H. B. Nandpuru, V. Kapur, L. Kosta, MRI brain cance classification using hybrid classifier (SVM-KNN
 S. Vidhusha, K. Anandhan, Analysis and evaluation of autistic brain MR images using learning vector quantization and support vector machines, in: International conference on Industrial instrumentation and control, 2015.
 D. Sridhar, M. Krishna, Brain tumor classification using discrete cosine transform and probablistic neural network, in: ICSIPR, 2013.
 M. Kaus, S. K. Warfield, A. Nabavi, E. Chatzidakis, P. M. Black, F. A. Jolesz, R. Kikinis, Segmentation of meningiomas and low grade gliomas in MRI, in: Medical Image Computing and Computer-Assisted Intervention– MICCAI99, Springer, 1999, pp. 1–10.
 S. Pereira, A. Pinto, V. Alves, C. A. Silva, Brain tumor segmentation using convolutional neural networks in mri images, IEEE transactions on medical imaging 35 (5) (2016) 1240–1251.
 M. Havaei, A. Davy, D. Warde-Farley, A. Biard, A. Courville, Y. Bengio, C. Pal, P.-M. Jodoin, H. Larochelle, Brain tumor segmentation with deep neural networks, Medical Image Analysis.
 L. Zhao, K. Jia, Multiscale cnns for brain tumor segmentation and diagnosis, Computational and mathematical methods in medicine 2016.
 V. S. Lee, Cardiovascular MR Imaging: Physical Principles to Practical Protocols, 1st Edition, Lippincott Williams and Wilkins, Philadelphia, PA, 2006.
 J. Zhou, K. Chan, V. Chong, S. M. Krishnan, Extraction of brain tumor from MR images using one-class support vector machine, in: Engineering in Medicine and Biology Society, 2005. IEEE-EMBS 2005. 27th Annual International Conference of the, IEEE, 2006, pp. 6411–6414.
 R. Cuingnet, J. A. Glaun`es, M. Chupin, H. Benali, O. Colliot, T. A. D. N. Initiative, Spatial and anatomical regularization of SVM: A general framework for neuroimaging data, IEEE Trans. on Pattern Analysis and Machine Intelligence 35 (2013) 682–696. 5
 D. R. J. Ramteke, K. M. Y, Automatic medical image classification and abnormality detection using k-nearest neighbour, International Journal of Advanced Computer Research 2.
 V. P. Rathi, Dr.S.Palani, Brain tumor MRI image classification with feature selection and extraction using linear discriminant analysis, CoRR abs/1208.2128.
 A. J. Smola, B. Scholkopf, A tutorial on support vector regression, Statistics and computing 14 (3) (2004) 199–222.
 C. J. Burges, A tutorial on support vector machines for pattern recognition, Data mining and knowledge discovery 2 (2) (1998) 121–167.
 C.-C. Chang, C.-b. Lin, Training v-support vector classifiers: theory and algorithms, Neural computation 13 (9) (2001) 2119–2147.
 L. Breiman, Random forests, Machine learning 45 (1) (2001) 5–32.
 S. Bernard, L. Heutte, S. Adam, On the selection of decision trees in random forests, in: Neural Networks, 2009. IJCNN 2009. International Joint Conference on, IEEE, 2009, pp. 302–307.
 B. S. Wade, S. H. Joshi, T. Pirnia, A. M. Leaver, R. P. Woods, P. M. Thompson, R. Espinoza, K. L. Narr, Random forest classification of depression status based on subcortical brain morphometry following electroconvulsive therapy, in: Biomedical Imaging (ISBI), 2015 IEEE 12th International Symposium on, IEEE, 2015, pp. 92–96.
 M. Sifuzzaman, M. Islam, M. Ali, Application of wavelet transform and its advantages compared to fourier transform.
 M. Kociolek, A. Materka, M. Strzelecki, P. Szczypin´ski, Discrete wavelet transform-derived features for digital image texture analysis, in: International Conference on Signals and Electronic Systems, 2001, pp. 99–104.
 S. K. Mohideen, S. A. Perumal, M. M. Sathik, Image de-noising using discrete wavelet transform, International Journal of Computer Science and Network Security 8 (1) (2008) 213–216.
 C.-L. Yang, W.-R. Gao, L.-M. Po, Discrete wavelet transform-based structural similarity for image quality assessment, in: Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on, IEEE, 2008, pp. 377– 380.
 I. Krashenyi, A. Popov, J. Ramirez, J. M. Gorriz, Application of fuzzy logic for alzheimer’s disease diagnosis, in: Signal Processing Symposium (SPSympo), 2015, IEEE, 2015, pp. 1–4.
 A. Osareh, M. Mirmehdi, B. Thomas, R. Markham, Automated identification of diabetic retinal exudates in digital colour images, Br J Opthalmal 87 (2003) 1220–1223.
 A. P. Kumar, J. K. Chaithanya, Automatic classification and segmentation of tumors from skull stripped images using pnn & sfcm, Global Journal of Computer Science and Technology 15 (1).
 S. D. S. Al-Shaikhli, M. Y. Yang, B. Rosenhahn, Brain tumor classification using sparse coding and dictionary learning, in: Image Processing (ICIP), 2014 IEEE International Conference on, IEEE, 2014, pp. 2774–2778.
 M. M. Letteboer, O. F. Olsen, E. B. Dam, P. W. Willems, M. A. Viergever, W. J. Niessen, Segmentation of tumors in magnetic resonance brain images using an interactive multiscale watershed algorithm 1, Academic Radiology 11 (10) (2004) 1125–1138.
 B. H. Menze, K. Van Leemput, D. Lashkari, M.-A. Weber, N. Ayache, P. Golland, A generative model for brain tumor segmentation in multi- modal images, in: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2010, Springer, 2010, pp. 151–159.
 J. Selvakumar, A. Lakshmi, T. Arivoli, Brain tumor segmentation and its area calculation in brain MR images using K-mean clustering and fuzzy C-mean algorithm, in: Advances in Engineering, Science and Management (ICAESM), 2012 International Conference on, IEEE, 2012, pp. 186–190.