Neural Networks are an immensely versatile tool for state-of-the-art prediction problems. However, they require a training process that involves numerous hyper-parameters. This creates a training process that demands expert knowledge to configure and is often described as a trial-and-error process. The result is a training process that needs to be executed multiple times and this is highly time expensive. Currently, one solution to this problem is to perform a Grid-Search algorithm. This is where a set of possible values (essentially guesses) is declared for each hyper-parameter. Then each combination of hyper-parameters is used to configure the training session. Once the training of each model (hyper-parameter combination) is completed, the best performing model is retained, and the rest are discarded. The problem with this is that it can be wasteful as it explores hyper-parameter combinations that predictably produce poor models. It is also very time consuming and scales poorly with the size of the model. A number of methods are proposed in this {thesis} to efficiently derive hyper-parameters and model parameters and the empirical results are presented. These methods are split into two categories, Weight-Direct Determination (WDD), and Simple Effective Evolutionary Method. The former category exhibits success in certain cases whereas the latter exhibits a broad success across Classification and Regression; amongst a large number of samples and features and small number of samples and features. The thesis concludes that the WDD is only effective on small datasets (both in terms of the number of samples and number of input features). This is due to its dependence on Delaunay Triangulation which exhibits a quadratic time complexity with-respect-to the number of input samples. It is deemed that the WDD methods developed in this research are not optimal for achieving general-purpose application of Multi-Layer Perceptrons. However, the Complete Simple Effective Evolutionary Method (CSEEM) from the SEEM Chapter shows great promise as it is able to perform effectively on the `Knowledge Extraction based on Evolutionary Learning' (KEEL) Datasets for both Regression and Classification. This method can achieve this effectiveness whilst only requiring a single hyper-parameter (the number of children in a population) that is fairly invariant across datasets. In this {thesis}, CSEEM is applied to real-world regression and classification problems. It is also compared to RMSProp (gradient-dependent iterative method) to compare its performance with an existing gradient-dependent method. In both categories, CSEEM consistently performs with a lower normalized square loss and higher classification accuracy, respectively, versus the number hidden nodes when compared to RMSProp.