tensorflow学习笔记(3):LOGISTIC REGRESSION WITH TENSORFLOW
What is different between Linear and Logistic Regression?
While Linear Regression is suited for estimating continuous values (e.g. estimating house price), it isn’t the best tool for predicting the class of an observed data point. In order to estimate a classification, we need some sort of guidance on what would be the most probable class for that data point. For this, we use Logistic Regression.
Recall linear regression:
Linear regression finds a function that relates a continuous dependent variable, y, to some predictors (independent variables x1, x2, etc.). Simple linear regression assumes a function of the form:
and finds the values of w0, w1, w2, etc. The term w0 is the “intercept” or “constant term” (it’s shown as b in the formula below):
Logistic Regression is a variation of Linear Regression, useful when the observed dependent variable, y, is categorical. It produces a formula that predicts the probability of the class label as a function of the independent variables.
Despite the name logistic regression, it is actually a probabilistic classification model. Logistic regression fits a special s-shaped curve by taking the linear regression and transforming the numeric estimate into a probability with the following function:
which produces p-values between 0 (as y approaches minus infinity) and 1 (as y approaches plus infinity). This now becomes a special kind of non-linear regression.
In this equation, y is the regression result (the sum of the variables weighted by the coefficients), exp
is the exponential function and $θ(y)$ is the logistic function, also called logistic curve. It is a common “S” shape (sigmoid curve), and was first developed for modelling population growth.
You might also have seen this function before, in another configuration:
So, briefly, Logistic Regression passes the input through the logistic/sigmoid but then treats the result as a probability:
Utilizing Logistic Regression in TensorFlow
For us to utilize Logistic Regression in TensorFlow, we first need to import whatever libraries we are going to use. To do so, you can run the code cell below.
import tensorflow as tf |
Next, we will load the dataset we are going to use. In this case, we are utilizing the iris dataset, which is inbuilt — so there’s no need to do any preprocessing and we can jump right into manipulating it. We separate the dataset into xs and ys, and then into training xs and ys and testing xs and ys, (pseudo-)randomly.
iris = load_iris() |
Now we define x and y. These placeholders will hold our iris data (both the features and label matrices), and help pass them along to different parts of the algorithm. You can consider placeholders as empty shells into which we insert our data. We also need to give them shapes which correspond to the shape of our data. Later, we will insert data into these placeholders by “feeding” the placeholders the data via a “feed_dict” (Feed Dictionary).
Why use Placeholders?
1) This feature of TensorFlow allows us to create an algorithm which accepts data and knows something about the shape of the data without knowing the amount of data going in.
2) When we insert “batches” of data in training, we can easily adjust how many examples we train on in a single step without changing the entire algorithm.
# numFeatures is the number of features in our input data. |
Set model weights and bias
Much like Linear Regression, we need a shared variable weight matrix for Logistic Regression. We initialize both W
and b
as tensors full of zeros. Since we are going to learn W
and b
, their initial value doesn’t matter too much. These variables are the objects which define the structure of our regression model, and we can save them after they’ve been trained so we can reuse them later.
We define two TensorFlow variables as our parameters. These variables will hold the weights and biases of our logistic regression and they will be continually updated during training.
Notice that W
has a shape of [4, 3] because we want to multiply the 4-dimensional input vectors by it to produce 3-dimensional vectors of evidence for the difference classes. b
has a shape of [3] so we can add it to the output. Moreover, unlike our placeholders above which are essentially empty shells waiting to be fed data, TensorFlow variables need to be initialized with values, e.g. with zeros.
W = tf.Variable(tf.zeros([4, 3])) # 4-dimensional input and 3 classes |
#Randomly sample from a normal distribution with standard deviation .01 |
Logistic Regression model
We now define our operations in order to properly run the Logistic Regression. Logistic regression is typically thought of as a single equation:
However, for the sake of clarity, we can have it broken into its three main components:
- a weight times features matrix multiplication operation,
- a summation of the weighted features and a bias term,
- and finally the application of a sigmoid function.
As such, you will find these components defined as three separate operations below.
# Three-component breakdown of the Logistic Regression equation. |
As we have seen before, the function we are going to use is the logistic function $(\frac{1}{1+e^{-x}})$, which is fed the input data after applying weights and bias. In TensorFlow, this function is implemented as the nn.sigmoid
function. Effectively, this fits the weighted input with bias into a 0-100 percent curve, which is the probability function we want.
Training
The learning algorithm is how we search for the best weight vector (${\bf w}$). This search is an optimization problem looking for the hypothesis that optimizes an error/cost measure.
What tell us our model is bad?
The Cost or Loss of the model, so what we want is to minimize that.
What is the cost function in our model?
The cost function we are going to utilize is the Squared Mean Error loss function.
How to minimize the cost function?
We can’t use least-squares linear regression here, so we will use gradient descent instead. Specifically, we will use batch gradient descent which calculates the gradient from all data points in the data set.
Cost function
Before defining our cost function, we need to define how long we are going to train and how should we define the learning rate.
# Number of Epochs in our training |
#Defining our cost function - Squared Mean Error |
Now we move on to actually running our operations. We will start with the operations involved in the prediction phase (i.e. the logistic regression itself).
First, we need to initialize our weights and biases with zeros or random values via the inbuilt Initialization Op, tf.initialize_all_variables(). This Initialization Op will become a node in our computational graph, and when we put the graph into a session, then the Op will run and create the variables.
# Create a tensorflow session |
We also want some additional operations to keep track of our model’s efficiency over time. We can do this like so:
# argmax(activation_OP, 1) returns the label with the most probability |
Now we can define and run the actual training loop, like this:
# Initialize reporting variables |
Why don’t we plot the cost to see how it behaves?
%matplotlib inline |
Assuming no parameters were changed, you should reach a peak accuracy of 90% at the end of training, which is commendable. Try changing the parameters such as the length of training, and maybe some operations to see how the model behaves. Does it take much longer? How is the performance?