#
Deep Learning in Process Optimization, Part 2 - Building a Deep Neural Network
Deep Learning in Process Optimization, Part 2 - Building a Deep Neural Network

The previous blog post of the *Deep Learning in Process Optimization series* concerned data collection, exploratory data analysis and data preprocessing stages. This blog post is about the next stage: building a deep neural network.

After preprocessing of the time series data that have been collected from the plant processes, it is possible to try various machine learning techniques to analyze the data and gain some new knowledge. Because the data and the modeled effects are most likely very complex, it is not surprising if several methods give poor results. However, based on earlier literature, there is at least one class of deep learning algorithms that is a promising approach for modeling a complex time series data. That class is called *recurrent neural network (RNN)*.

RNN is a special type of neural network that works best with sequential data. In a basic *feedforward neural network*, data points traverse from an input layer and hidden layers to an output layer through a directed acyclic graph structure. The purpose of hidden layers is to complete a set of calculations for the data points to prepare data for the final prediction. In RNN, data points are analyzed as a sequence. When a single data point has gone through the hidden layers, it returns to the network to give information that will be used when analyzing the next data point of the sequence. This ensures that RNN is capable of detecting possible dependencies between consecutive data points. Both neural network structure types can be seen in Figure 1.

*Deep learning* is a collection of machine learning algorithms where neural networks are used as a primary method. Thus, when analyzing a production process data obtained from a plant with RNN, we are actually applying deep learning to make sense of the production process effects. We say that a single neural network is deep when it has more than two hidden layers and, based on this definition, we use a deep neural network when our RNN has at least three hidden layers.

Before RNN can be applied in practice, it has to be trained. A standard approach in machine learning is to split the preprocessed data into training and test sets. RNN uses the training set to build a model, i.e, to learn how the data points behave. Test set instead is used to test how well the RNN can predict the effects from unseen data.

In time series prediction process, training and test sets have to be split into time windows. Each time window includes a sequence of data points, and the length of this sequence depends on how long some interesting effect or anomaly lasts in data. If it takes, for example, 30 minutes, then the size of the time window can be chosen so that the time range between the first and the last data points in each time window is 30 minutes. If the data points have a frequency of, say, 10 seconds, then every time window contains 180 sequential data points.

In order to utilize the data maximally, the time windows are constructed by using the *moving window technique*. This technique allows data points to belong to several different time windows. For example, if we have a data [1, 2, 3, 4, 5] and the window size is 3, all the time windows gained are [1, 2, 3], [2, 3, 4] and [3, 4, 5]. Figure 2 shows that technique on a more general level.

Forming the time windows is a task to be made carefully since it has to be verified that all the data points in every time window have a correct frequency by their timestamps. If the data quality is poor, some of the time windows might contain too long delays between successive data points when a number of missing data points are between them. Processing that kind of time series data without paying attention to their actual delays may distort the results. Thus, the time windows with an invalid resolution are not allowed to be used.

The RNN is trained to predict the value of the target variable on the last timestamp of the time series in a time window. It analyzes every time window separately and tries to learn what kind of target variable value their data point sequences produce. However, it is not always that simple. Sometimes there is some delay before we see how a single data sequence really effects on the target variable. Thus, the RNN has to be trained to predict the future. If the delay is, for example, 5 minutes, then the RNN is trained to predict the value of the target variable from the end of a sequence to five minutes away. Predicting the future is always harder than predicting the present and, to make the RNN be good at forecasting the future, it has to be good at forecasting the present events first.

In order to develop as accurate RNN as possible, the structure of the RNN model must be chosen carefully. Neural networks, generally speaking, require *hyperparameters* to be tuned. This might require to import the data into a cloud environment with sufficient computing power. Then the RNN is trained in that computing environment several times with different hyperparameter values. When the optimal hyperparameters have been found, the final RNN model can be exported anywhere and the data can be removed from the cloud. Usually, the RNN model file, that is made for example with Keras is TensorFlow, is very small and can be used anywhere outside the cloud environments to predict the value of the target variable. Figure 3 shows example curves of an accurate prediction result.

The next step is to find the optimal adjustments to minimize the target variable. Read more about that on the next blog post *Deep Learning in Process Optimization, Part 3: Finding the Optimal Process Values*.