Introduction to Artificial Neural Networks – Analytics Vidhya

This article was published as a part of the Data Science Blogathon

Artificial Neural Networks (ANN) are algorithms based on brain function and are used to model complicated patterns and forecast issues. The Artificial Neural Network (ANN) is a deep learning method that arose from the concept of the human brain Biological Neural Networks. The development of ANN was the result of an attempt to replicate the workings of the human brain. The workings of ANN are extremely similar to those of biological neural networks, although they are not identical. ANN algorithm accepts only numeric and structured data.

Convolutional Neural Networks (CNN) and Recursive Neural Networks (RNN) are used to accept unstructured and non-numeric data forms such as Image, Text, and Speech. This article focuses solely on Artificial Neural Networks.

Artificial Neural Networks Architecture

1. There are three layers in the network architecture: the input layer, the hidden layer (more than one), and the output layer. Because of the numerous layers are sometimes referred to as the MLP (Multi-Layer Perceptron).

Artificial Neural NetworksImage 1  

2. It is possible to think of the hidden layer as a “distillation layer,” which extracts some of the most relevant patterns from the inputs and sends them on to the next layer for further analysis. It accelerates and improves the efficiency of the network by recognizing just the most important information from the inputs and discarding the redundant information.

3. The activation function is important for two reasons: first, it allows you to turn on your computer.

  • This model captures the presence of non-linear relationships between the inputs.
  • It contributes to the conversion of the input into a more usable output.

activation function | Artificial Neural NetworksImage 2

4. Finding the “optimal values of W — weights” that minimize prediction error is critical to building a successful model. The “backpropagation algorithm” does this by converting ANN into a learning algorithm by learning from mistakes.

5. The optimization approach uses a “gradient descent” technique to quantify prediction errors. To find the optimum value for W, small adjustments in W are tried, and the impact on prediction errors is examined. Finally, those W values are chosen as ideal since further W changes do not reduce mistakes.

Benefits of Artificial Neural Networks

ANNs offers many key benefits that make them particularly well-suited to specific issues and situations:

1. ANNs can learn and model non-linear and complicated interactions, which is critical since many of the relationships between inputs and outputs in real life are non-linear and complex.

2. ANNs can generalize – After learning from the original inputs and their associations, the model may infer unknown relationships from anonymous data, allowing it to generalize and predict unknown data.

3. ANN does not impose any constraints on the input variables, unlike many other prediction approaches (like how they should be distributed). Furthermore, numerous studies have demonstrated that ANNs can better simulate heteroskedasticity, or data with high volatility and non-constant variance, because of their capacity to discover latent correlations in the data without imposing any preset associations. This is particularly helpful in financial time series forecasting (for example, stock prices) when significant data volatility.

Application of Artificial Neural Networks

ANNs have a wide range of applications because of their unique properties. A few of the important applications of ANNs include:

1. Image Processing and Character recognition:

ANNs play a significant part in picture and character recognition because of their capacity to take in many inputs, process them, and infer hidden and complicated, non-linear correlations. Character recognition, such as handwriting recognition, has many applications in fraud detection (for example, bank fraud) and even national security assessments.

applications of Artificial Neural NetworksImage 3

Image recognition is a rapidly evolving discipline with several applications ranging from social media facial identification to cancer detection in medicine to satellite image processing for agricultural and defense purposes.

Deep neural networks, which form the core of “deep learning,” have now opened up all of the new and transformative advances in computer vision, speech recognition, and natural language processing – notable examples being self-driving vehicles, thanks to ANN research.

2. Forecasting:

Forecasting is widely used in everyday company decisions (sales, the financial allocation between goods, and capacity utilization), economic and monetary policy, finance, and the stock market. Forecasting issues are frequently complex; for example, predicting stock prices is complicated with many underlying variables (some known, some unseen).

forecastingImage 4

Traditional forecasting models have flaws when it comes to accounting for these complicated, non-linear interactions. Given its capacity to model and extract previously unknown characteristics and correlations, ANNs can provide a reliable alternative when used correctly. ANN also has no restrictions on the input and residual distributions, unlike conventional models.

Advantages of Artificial Neural Networks

  1. Attribute-value pairs are used to represent problems in ANN.
  2. The output of ANNs can be discrete-valued, real-valued, or a vector of multiple real or discrete-valued characteristics, while the target function can be discrete-valued, real-valued, or a vector of numerous real or discrete-valued attributes.
  3. Noise in the training data is not a problem for ANN learning techniques. There may be mistakes in the training samples, but they will not affect the final result.
  4. It’s utilized when a quick assessment of the taught target function is necessary.
  5. The number of weights in the network, the number of training instances evaluated, and the settings of different learning algorithm parameters can all contribute to extended training periods for ANNs.

Disadvantages of Artificial Neural Networks

1. Hardware Dependence:

  • The construction of Artificial Neural Networks necessitates the use of parallel processors.
  • As a result, the equipment’s realization is contingent.

2. Understanding the network’s operation:

  • This is the most serious issue with ANN.
  • When ANN provides a probing answer, it does not explain why or how it was chosen.
  • As a result, the network’s confidence is eroded.

3. Assured network structure:

  • Any precise rule does not determine the structure of artificial neural networks.
  • Experience and trial and error are used to develop a suitable network structure.

4. Difficulty in presenting the issue to the network:

  • ANNs are capable of working with numerical data.
  • Before being introduced to ANN, problems must be converted into numerical values.
  • The display method that is chosen will have a direct impact on the network’s performance.
  • The user’s skill is a factor here.

5. The network’s lifetime is unknown:

  • When the network’s error on the sample is decreased to a specific amount, the training is complete.
  • The value does not produce the best outcomes.

Create a Simple ANN for the famous Titanic Dataset

Now that we have discussed the architecture, advantages, and disadvantages it’s time to create an ANN model so that we would know how it works.

For understanding ANN we would be using world-famous titanic survival prediction. you can find the dataset here https://www.kaggle.com/jamesleslie/titanic-neural-network-for-beginners/data?select=train_clean.csv.

let’s start with importing the dependencies.

## import dependencies 
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.pyplot import rcParams
%matplotlib inline
rcParams['figure.figsize'] = 10,8
sns.set(style='whitegrid', palette='muted',
        rc={'figure.figsize': (15,10)})
import os
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasClassifier
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from numpy.random import seed
from tensorflow import set_random_seed
Once you have all the preprocessing and modeling libraries imported, we will read the training and testing data. 
# Load data as Pandas dataframe
train = pd.read_csv('./train_clean.csv', )
test = pd.read_csv('./test_clean.csv')
df = pd.concat([train, test], axis=0, sort=True)
df.head()

datasetSource: Local

We have concatenated both training and testing CSV in order to apply the same preprocessing method on both of them. once created the dataset we would start preprocessing the dataset since it has multiple columns that are non-numbers. Starting with the column name ‘sex’ in the dataset, we would be converting it to binary variables.

# convert to cateogry dtype
df['Sex'] = df['Sex'].astype('category')
# convert to category codes
df['Sex'] = df['Sex'].cat.codes

After this, we need to convert the rest of the variables:

# subset all categorical variables which need to be encoded
categorical = ['Embarked', 'Title']
for var in categorical:
    df = pd.concat([df, 
                    pd.get_dummies(df[var], prefix=var)], axis=1)
    del df[var]
# drop the variables we won't be using
df.drop(['Cabin', 'Name', 'Ticket', 'PassengerId'], axis=1, inplace=True)
df.head()

cleaned dataSource: Local

## scale continuous variable

continuous = ['Age', 'Fare', 'Parch', 'Pclass', 'SibSp', 'Family_Size'] scaler = StandardScaler() for var in continuous: df[var] = df[var].astype('float64') df[var] = scaler.fit_transform(df[var].values.reshape(-1, 1))

Once preprocessing is done we need to split the train and test the dataset again, for that you can use the following code.

X_train = df[pd.notnull(df['Survived'])].drop(['Survived'], axis=1)
y_train = df[pd.notnull(df['Survived'])]['Survived']
X_test = df[pd.isnull(df['Survived'])].drop(['Survived'], axis=1)

Now is the time to define the hyperparameters and define the architecture of the ANN model.

lyrs=[8]
act='linear' 
opt='Adam'
dr=0.0
# set random seed for reproducibility
seed(42)
set_random_seed(42)
model = Sequential()
# create first hidden layer
model.add(Dense(lyrs[0], input_dim=X_train.shape[1], activation=act))
# create additional hidden layers
for i in range(1,len(lyrs)):
    model.add(Dense(lyrs[i], activation=act))
# add dropout, default is none
model.add(Dropout(dr))
# create output layer
model.add(Dense(1, activation='sigmoid'))  # output layer
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
model = create_model()
print(model.summary())

model summary | Artificial Neural NetworksSource: Local

after model definition, we will fit the model on our training data and would get the model insight.

# train model on full train set, with 80/20 CV split
training = model.fit(X_train, y_train, epochs=100, batch_size=32, validation_split=0.2, verbose=0)
val_acc = np.mean(training.history['val_acc'])
print("n%s: %.2f%%" % ('val_acc', val_acc*100))
# summarize history for accuracy
plt.plot(training.history['acc'])
plt.plot(training.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()

train test accuracySource: Local

Now you can use the model for predictions on test data, using the following code chunk:

# calculate predictions
test['Survived'] = model.predict(X_test)
test['Survived'] = test['Survived'].apply(lambda x: round(x,0)).astype('int')
solution = test[['PassengerId', 'Survived']]
print(solution)

predicctions | Artificial Neural NetworksSource: Local

Conclusion:

Analytical neural networks (ANNs) are powerful models that can be applied in many scenarios. Several noteworthy uses of ANNs have been mentioned above, although they have applications in various industries, including medical, security/finance, government, agricultural, and defense.

References:

https://www.kaggle.com

  1. Image 1 -https://www.analyticsvidhya.com
  2. Image 2-   https://medium.com
  3. Image 3 –  https://medium.com
  4. Image 4 –  https://medium.com

Thanks for reading this article do like if you have learned something new, feel free to comment See you next time !!! ❤️ 

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.