What is TensorFlow? Installation, Basics, and more

Tensorflow is an open-source library for numerical computation and large-scale machine learning that ease Google Brain TensorFlow, acquiring data, training models, serving predictions, and refining future results.

Tensorflow bundles together Machine Learning and Deep Learning models and algorithms. It uses Python as a convenient front-end and runs it efficiently in optimized C++.

Tensorflow allows developers to create a graph of computations to perform. Each node in the graph represents a mathematical operation, and each connection represents data. Hence, instead of dealing with low details like figuring out proper ways to hitch the output of one function to the input of another, the developer can focus on the overall logic of the application.

In the deep learning artificial intelligence research team at Google, Google Brain, in the year 2015, developed TensorFlow for Google’s internal use. The research team uses this Open-Source Software library to perform several important tasks.
TensorFlow is, at present, the most popular software library. There are several real-world applications of deep learning that make TensorFlow popular. Being an Open-Source library for deep learning and machine learning, TensorFlow plays a role in text-based applications, image recognition, voice search, and many more. DeepFace, Facebook’s image recognition system, uses TensorFlow for image recognition. It is used by Apple’s Siri for voice recognition. Every Google app has made good use of TensorFlow to improve your experience.

what is tensorflow

What are Tensors?

All the computations associated with TensorFlow involve the use of tensors.

A tensor is a vector/matrix of n-dimensions representing types of data. Values in a tensor hold identical data types with a known shape, and this shape is the dimensionality of the matrix. A vector is a one-dimensional tensor; a matrix is a two-dimensional tensor. A scalar is a zero-dimensional tensor.

In the graph, computations are made possible through interconnections of tensors. The mathematical operations are carried by the node of the tensor, whereas a tensor’s edge explains the input-output relationships between nodes.
Thus TensorFlow takes an input in the form of an n-dimensional array/matrix (known as tensors), which flows through a system of several operations and comes out as output. Hence the name TensorFlow. A graph can be constructed to perform necessary operations at the output.

How to Install Tensorflow?

Assuming you have a setup, TensorFlow can be installed directly via pip. python jupyter-notebook

pip3 install --upgrade tensorflow

If you need GPU support, you will have to install by tensorflow-gpu tensorflow 

To test your installation, simply run the following: 

$ python -c "import tensorflow; print(tensorflow.__version__)" 2.0.0

Tensorflow Basics

Tensorflow’s name is directly derived from its core component. A tensor is a vector or matrix of n-dimensions representing all Tensor data types.


The shape is the dimensionality of the matrix. In the image above, the shape of the tensor is. (2,2,2) 


Type represents the kind of data (integers, strings, floating-point values, etc.). All values in a tensor hold identical data types. 


The graph is a set of computations that takes place successively on input tensors. Basically, a graph is just an arrangement of nodes that represent the operations in your model. 


The session encapsulates the environment in which the evaluation of the graph takes place.


Operators are pre-defined basic mathematical operations. Examples: 

tf.add(a, b) tf.substract(a, b) 

Tensorflow also allows users to define custom operators, e.g., increment by 5, which is an advanced use case and out of scope for this article. 

Tensorflow Python Simplified 

Creating a Graph and Running it in a Session 

A tensor is an object with three properties: 

  • A unique label (name)
  • A dimension (shape)
  • A data type (dtype) 

Each operation you will do with TensorFlow involves the manipulation of a tensor. There are four main tensors that you can create: 

  • tf.variable tf.constant tf.placeholder tf.SparseTensor 

Constants are (guess what!) constants. As their name states, their value doesn’t change. We’d usually need our network parameters to be updated, and that’s where they come into play. variable 

The following code creates the graph represented in Figure 1:

import tensorflow as tf x = tf.Variable(3, name="x") y = tf.Variable(4, name="y") f = ((x * x) * y) + (y + 2)

The most important thing to understand is that this code does not actually perform any computation, even though it looks like it does (especially the last line). It just creates a computation graph. In fact, even the variables are not initialized yet. To evaluate this graph, you need to open a TensorFlow and use it to initialize the variables and evaluate. A TensorFlow session takes care of placing the operations onto s session f devices such as CPUs and GPUs and running them, and it holds all the variable values. 

The following code creates a session, initializes the variables, and evaluates, then closes the session (which frees up resources):

sess = tf.Session()
sess.run(y.initializer) result =
sess.run(f) print(result) # 42

There is also a better way:

with tf.Session() as sess: 
result = f.eval()

Inside the ‘with’ block, the session is set as the default session. Calling is equivalent to calling x.initializer.run() tf.get_default_sess , and similarly is equivalent to calling . This makes the code ion().run(x.initializer) f.eval() tf.get_default_session().run(f) easier to read. Moreover, the session is automatically closed at the end of the block. 

Instead of manually running the initializer for every single variable, you can use the function. Note that global_variables_initializer() does not actually perform the initialization immediately but rather creates a node in the graph that will initialize all variables when it is run:

init = tf.global_variables_initializer() # prepare an init node with tf.Session() as sess:
init.run() # actually initialize all the variables result = f.eval()

Linear Regression with TensorFlow

What is Linear Regression?

Imagine you have two variables, x, and y, and your task is to predict the value of knowing the value of. If you plot the data, you can see a positive relationship between your independent variable, x, and your dependent variable, y.

You may observe if x=1, y will roughly be equal to 6 and if x=2, y will be around 8.5.

This method is not very accurate and prone to error, especially with a dataset with hundreds of thousands of points. 

Linear regression is evaluated with an equation. The variable y is explained by one or many covariates. In your example, there is only one dependent variable. If you have to write this equation, If you have to write this equation, it will be: 

y = + X +

With: is the bias. i.e. if x=0, y= 

is the weight associated with x, i.e., if x = 1, y = is the residual or error of the model. It includes what the model cannot learn from the data.

Imagine you fit the model, and you find the following solution: 

= 3.8 = 2.78 

You can substitute those numbers in the equation, and it becomes: y= 3.8 + 2.78x 

You now have a better way to find the values for y. That is, you can replace x with any value you want to predict y. In the image below, we have replaced x in the equation with all the values in the dataset and plotted the result.

The red line represents the fitted value, that is, the value of y for each value of x. You don’t need to see the value of x to predict y. For each x, a y belongs to the red line. You can also predict values of x higher than 2.

The algorithm will choose a random number for each and replace the value of x to get the predicted value of y. If the dataset has 100 observations, the algorithm computes 100 predicted values. 

We can compute the error noted in the model, which is the difference between the predicted and real values. A positive error means the model underestimates the prediction of y, and a negative error means the model overestimates the prediction of y. 

= y – ypred 

Your goal is to minimize the square of the error. The algorithm computes the mean of the square error. This step is called the minimization of the error. Mathematically, it is: Mean Square Error. 

The algorithm computes 100 predicted values. 

We can compute the error noted in the model, which is the difference between the predicted and real values. A positive error means the model underestimates the prediction of y, and a negative error means the model overestimates the prediction of y. 

= y – ypred 

Your goal is to minimize the square of the error. The algorithm computes the mean of the square error. This step is called the minimization of the error.

The algorithm computes 100 predicted values. 

We can compute the error noted in the model, which is the difference between the predicted and real values. A positive error means the model underestimates the prediction of y, and a negative error means the model overestimates the prediction of y. 

= y – ypred 

Your goal is to minimize the square of the error. The algorithm computes the mean of the square error. This step is called the minimization of the error.

The algorithm computes 100 predicted values. 

We can compute the error noted in the model, which is the difference between the predicted and real values. A positive error means the model underestimates the prediction of y, and a negative error means the model overestimates the prediction of y. 

= y – ypred 

Your goal is to minimize the square of the error. The algorithm computes the mean of the square error. This step is called the minimization of the error.


is the weights, so X refers to the predicted value T T i y is the real value m is the number of observations 

The goal is to find the best that minimizes the MSE. 

If the average error is large, it means the model performs poorly, and the weights are not chosen properly. To correct the weights, you need to use an optimizer. The traditional optimizer is called Gradient Descent. 

The gradient descent takes the derivative and decreases or increases the weight. If the derivative is positive, the weight is decreased. Suppose the derivative is negative, and the weight increases. The model will update the weights and recompute the error. This process is repeated until the error does not change anymore. Besides, the gradients are multiplied by a learning rate. It indicates the speed of iteration of the learning. 

If the learning rate is too small, it will take a very long time for the algorithm to converge (i.e., it requires lots of iterations). If the learning rate is too high, the algorithm might never converge.

Predict Prices for California Houses

scikit-learn provides tools to load larger datasets, downloading them if necessary. We’ll be using the California Housing Dataset for Regression Problem. 

We are fetching the dataset and adding an extra bias input feature to all training instances.

import numpy as np
from sklearn.datasets import fetch_california_housing housing = fetch_california_housing() m, n = housing.data.shape 
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]

Following is the code for performing a linear regression on the dataset

n_epochs = 1000 learning_rate = 0.01 
X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X") y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y") theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0), name="theta") y_pred = tf.matmul(X, theta, name="predictions") error = y_pred - y mse = tf.reduce_mean(tf.square(error), name="mse") gradients = tf.gradients(mse, [theta])[0] training_op = tf.assign(theta, theta - learning_rate * gradients) 
init = tf.global_variables_initializer() with tf.Session() as sess: 
sess.run(init) for epoch in range(n_epochs): 
if epoch%100==0: 
print("Epoch", epoch, "MSE =", mse.eval()) sess.run(training_op) 
best_theta = theta.eval()

The main loop executes the training step over and over again (n_epochs times), and every 100 iterations, it prints out the current Mean Squared Error (MSE). 

TensorFlow’s autodiff feature can automatically and efficiently compute the gradients for you. The gradients() function takes an op (in this case MSE) and a list of variables (in this case, just theta), and it creates a list of ops (one per variable) to compute the gradients of the op with regards to each variable. So the gradient node will compute the gradient vector of the MSE with regards to theta.

Linear Classification with Tensorflow

What is Linear Classification?

Classification aims to predict each class’s probability given a set of inputs. The label (i.e., the dependent variable) is a discrete value called a class. 

1. The learning algorithm is a binary classifier if the label has only two classes.
2. The multiclass classifier tackles labels with more than two classes.

For instance, a typical binary classification problem is to predict the likelihood a customer makes a second purchase. Predicting the type of animal displayed on a picture is a multiclass classification problem since there are more than two varieties of animals existing. 

For a binary task, the label can have two possible integer values. In most case, it is either [0,1] or [1,2]. For instance, the objective is to predict whether a customer will buy a product or not. The label is defined as follows: 

Y = 1 (customer purchased the product)
Y = 0 (customer does not purchase the product) 

The model uses features X to classify each customer in the most likely class he belongs to, namely, a potential buyer or not. The probability of success is computed with. The algorithm will compute a probability based on feature X and predicts a logistic regression success when this probability is above 50 percent. More formally, the probability is calculated as follows:

Where 0 is the set of weights, the features, and b is the bias. 

The function can be decomposed into two parts: 

  • The linear model
  • The logistic function 

Linear model 

You are already familiar with the way the weights are computed. Weights are computed using a dot product: Y is a linear function of all the features x. If the model does not have features, the prediction is equal to the bias, b.

The weights indicate the direction of the correlation between the features x and the label y. A positive correlation increases the probability of the i positive class while a negative correlation leads the probability closer to 0 (i.e., negative class). 

The linear model returns only real numbers, which is inconsistent with the probability measure of range [0,1]. The logistic function is required to convert the linear model output to a probability.

Logistic function

The logistic function, or sigmoid function, has an S-shape and the output of this function is always between 0 and 1.

It is easy to substitute the linear regression output into the sigmoid function. It results in a new number with a probability between 0 and 1. 

The classifier can transform the probability into a class 

Values between 0 to 0.49 become class 0
Values between 0.5 to 1 become class 1 

How to Measure the performance of Linear Classifier? 


The overall performance of a classifier is measured with the accuracy metric. Accuracy collects all the correct values divided by the total number of observations. For instance, an accuracy value of 80 percent means the model is correct in 80 percent of the cases.

You can note a shortcoming with this metric, especially for the imbalance classes. An imbalanced dataset occurs when the number of observations per group is not equal. Let’s say; you try to classify a rare event with a logistic function. Imagine the classifier trying to estimate the death of a patient following a disease. In the data, 5 percent of the patients pass away. You can train a classifier to predict the number of death and use the accuracy metric to evaluate the performances. If the classifier predicts 0 death for the entire dataset, it will be correct in 95 percent of the case. 

Confusion matrix 

A better way to assess the performance of a classifier is to look at the confusion matrix.

Precision & Recall

Recall: The ability of a classification model to identify all relevant instances Precision: The ability of a classification model to return only relevant instances

Classification of Income Level using Census Dataset 

Load Data. The data stored online are already divided between a train set and a test set.

import tensorflow as tf import pandas as pd 
## Define path data COLUMNS = ['age','workclass', 'fnlwgt', 'education', 'education_num', 'marital', 
'occupation', 'relationship', 'race', 'sex', 'capital_gain', 'capital_loss', 
'hours_week', 'native_country', 'label'] PATH = "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.d ata" PATH_test = "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.t est" 
df_train = pd.read_csv(PATH, skipinitialspace=True, names = COLUMNS, index_col=False) df_test = pd.read_csv(PATH_test,skiprows = 1, skipinitialspace=True, names = COLUMNS, index_col=False)

Tensorflow requires a Boolean value to train the classifier. You need to cast the values from string to integer. The label is stored as an object. However, you need to convert it into a numeric value. The code below creates a dictionary with the values to convert and loop over the column item. Note that you perform this operation twice, one for the train test and one for the test set.

label = {'50K': 1} df_train.label = [label[item] for item in df_train.label] label_t = {'50K.': 1} df_test.label = [label_t[item] for item in df_test.label]

Define the model.

model = tf.estimator.LinearClassifier( 
n_classes = 2, model_dir="ongoing/train", feature_columns=COLUMNS)

Train the model.

LABEL= 'label' def get_input_fn(data_set, num_epochs=None, n_batch = 128, shuffle=True): 
return tf.estimator.inputs.pandas_input_fn( 
x=pd.DataFrame({k: data_set[k].values for k in COLUMNS}), y = pd.Series(data_set[LABEL].values), batch_size=n_batch, num_epochs=num_epochs, shuffle=shuffle)
num_epochs=None, n_batch = 128, shuffle=False), steps=1000)

Evaluate the model.

num_epochs=1, n_batch = 128, shuffle=False), steps=1000)

Visualizing the Graph

So now we have a computation graph that trains a Linear Regression model using Mini-batch Gradient Descent, and we are saving checkpoints at regular intervals. However, we are still relying on the function to visualize progress during training. There is a better way: enter print() Tenso. If you feed it some training stats, it will display nice interactive visualizations of these stats in your web browser (e.g., learning curves). rBoard You can also provide it with the graph’s definition, and it will give you a great interface to browse through it. This is very useful for identifying errors in the graph, finding bottlenecks, and so on. 

The first step is to tweak your program a bit, so it writes the graph definition and some training stats – for example, the training error (MSE) – to a log directory that TensorBoard will read from. You need to use a different log directory every time you run your program, or else TensorBoard will merge stats from different runs, which will mess up the visualizations. The simplest solution for this is to include a timestamp in the log directory name. Add the following code at the beginning of the program:

from datetime import datetime now = datetime.utcnow().strftime("%Y%m%d%H%M%S") root_logdir = "tf_logs" logdir = "{}/run-{}/".format(root_logdir, now)

Next, add the following code at the very end of the construction phase:

mse_summary = tf.summary.scalar('MSE', mse) file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

The first line creates a node in the graph that will evaluate the MSE value and write it to a TensorBoard-compatible binary log string called a summary. The second line creates a FileWriter that you will use to write summaries to logfiles in the log directory. The first parameter indicates the path of the log directory (in this case, something like tf_logs/run-20200229130405/, relative to the current directory). The second (optional) parameter is the graph you want to visualize. Upon creation, the FileWriter creates the log directory if it does not already exist (and it’s parent directories if needed) and writes the graph definition in a binary logfile called an events file. Next, you need to update the execution phase to evaluate the mse_summary node regularly during training (e.g., every 10 mini-batches). This will output a summary that you can then write to the events file using the file_writer. Finally, the file_writer needs to be closed at the end of the program. Here is the updated code:

for batch_index in range(n_batches): 
X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size) if batch_index % 10 == 0: 
summary_str = mse_summary.eval(feed_dict={X: X_batch, y: y_batch}) step = epoch * n_batches + batch_index file_writer.add_summary(summary_str, step) sess.run(training_op, feed_dict={X: X_batch, y: y_batch}) 

Now when you run the program, it will create the log directory tf_logs/run-20200229130405 and write an events file in this directory, containing both the graph definition and the MSE values. If you run the program again, a new directory will be created under the tf_logs directory, e.g., tf_logs/run-20200229130526. Now that we have the data let’s fire up the TensorBoard server. To do so, simply run the tensorboard command pointing it to the root log directory. This starts the TensorBoard.

web server, listening on port 6006 (which is “goog” written upside down): $ tensorboard --logdir tf_logs/ Starting TensorBoard on port 6006 (You can navigate to

What is Artificial Neural Network?

An Artificial Neural Network(ANN) is composed of four principal objects: 

Layers: all the learning occurs in the layers. There are 3 layers 

1. Input
3. Output 

  • Feature and Label: Input data to the network(features) and output from the network (labels)
  • Loss function: Metric used to estimate the performance of the learning phase
  • Optimizer: Improve learning by updating the knowledge in the network.

A neural network will take the input data and push them into an ensemble of layers. The network needs to evaluate its performance with a loss function. The loss function gives to the network an idea of the path it needs to take before it masters the knowledge. The network needs to improve its knowledge with the help of an optimizer.

The program takes some input values and pushes them into two fully connected layers. Imagine you have a math problem, the first thing you do is to read the corresponding chapter to solve the problem. You apply your new knowledge to solve the problem. There is a high chance you will not score very well. It is the same for a network. The first time it sees the data and makes a prediction, it will not match perfectly with the actual data. 

To improve its knowledge, the network uses an optimizer. In our analogy, an optimizer can be thought of as rereading the chapter. You gain new insights/lessons by reading again. Similarly, the network uses the optimizer, updates its knowledge, and tests its new knowledge to check how much it still needs to learn. The program will repeat this step until it makes the lowest error possible. 

Our math problem analogy means you read the textbook chapter many times until you thoroughly understand the course content. Even after reading multiple times, if you keep making an error, it means you have reached the knowledge capacity with the current material. You need to use different textbooks or test different methods to improve your score. For a neural network, it is the same process. If the error is far from 100%, but the curve is flat, it means with the current architecture, it cannot learn anything else. The network has to be better optimized to improve the knowledge.

Neural Network Architecture


A layer is where all the learning takes place. Inside a layer, there are a large number of weights (neurons). A typical neural network is often processed by densely connected layers (also called fully connected layers). It means all the inputs are connected to all the outputs. 

A typical neural network takes a vector of input and a scalar that contains the labels. The most comfortable setup is a binary classification with only two classes: 0 and 1. 

  1. The first node is the input value.
  2. The neuron is decomposed into the input part and the activation function. The left part receives all the input from the previous layer. The right part is the sum of the input passes into an activation function.
  3. Output value computed from the hidden layers and used to make a prediction. For classification, it is equal to the number of classes. For regression, only one value is predicted.

Activation function 

The activation function of a node defines the output given a set of inputs. You need an activation function to allow the network to learn the non-linear pattern. A common activation function is a The function gives a zero for all negative values. Relu, Rectified linear unit.

The other activation functions are: 

  • Piecewise Linear
  • Sigmoid
  • Tanh
  • Leaky Relu 

The critical decision to make when building a neural network is: 

  • How many layers in the neural network
  • How many hidden units for each layer 

A neural network with lots of layers and hidden units can learn a complex representation of the data, but it makes the network’s computation very expensive. 

Loss function

After you have defined the hidden layers and the activation function, you need to specify the loss function and the optimizer. 

It is common practice to use a binary cross entropy loss function for binary classification. In linear regression, you use the mean square error. 

The loss function is an important metric to estimate the performance of the optimizer. During the training, this metric will be minimized. You must select this quantity carefully depending on the problem you are dealing with. 


The loss function is a measure of the model’s performance. The optimizer will help improve the weights of the network in order to decrease the loss. There are different optimizers available, but the most common one is the Stochastic Gradient Descent. 

The conventional optimizers are: 

  • Momentum optimization,
  • Nesterov Accelerated Gradient,
  • AdaGrad,
  • Adam optimization 

Example Neural Network in TensorFlow 

We will use the MNIST dataset to train your first neural network. Training a neural network with Tensorflow is not very complicated. The preprocessing step looks precisely the same as in the previous tutorials. You will proceed as follow: 

  • Step 1: Import the data
  • Step 2: Transform the data
  • Step 3: Construct the tensor
  • Step 4: Build the model
  • Step 5: Train and evaluate the model
  • Step 6: Improve the model
import numpy as np import tensorflow as tf np.random.seed(42)
from sklearn.datasets import fetch_mldata mnist = fetch_mldata(' /Users/Thomas/Dropbox/Learning/Upwork/tuto_TF/data/mldata/MNIST original') print(mnist.data.shape) print(mnist.target.shape)
from sklearn.model_selection import train_test_split 
X_train, X_test, y_train, y_test = train_test_split(mnist.data, mnist.target, test_size=0.2, random_state=42) y_train = y_train.astype(int) y_test = y_test.astype(int) batch_size =len(X_train) 
print(X_train.shape, y_train.shape,y_test.shape )
from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler() X_train_scaled = scaler.fit_transform(X_train.astype(np.float64)) X_test_scaled = scaler.fit_transform(X_test.astype(np.float64))
feature_columns = [tf.feature_column.numeric_column('x', shape=X_train_scaled.shape[1:])] 
estimator = tf.estimator.DNNClassifier( 
feature_columns=feature_columns, hidden_units=[300, 100], n_classes=10, model_dir = '/train/DNN')

Train and evaluate the model

# Train the estimator train_input = tf.estimator.inputs.numpy_input_fn( 
x={"x": X_train_scaled}, y=y_train, batch_size=50, shuffle=False, num_epochs=None) estimator.train(input_fn = train_input,steps=1000) eval_input = tf.estimator.inputs.numpy_input_fn( 
x={"x": X_test_scaled}, y=y_test, shuffle=False, batch_size=X_test_scaled.shape[0], num_epochs=1) estimator.evaluate(eval_input,steps=None)

Tensorflow Graphs

TensorFlow Graphs are generally sets of connected nodes, sometimes referred to as vertices, and the connections are referred to as edges.  The node functions as an input which involves some operations to give a preferable output.

In the above diagram, n1 and n2 are the two nodes having values 1 and 2, respectively, and an adding operation that happens at node n3 will help us get the output. We will try to perform the same operation using Tensorflow in Python.

We will import TensorFlow and define the nodes n1 and n2 first.

import tensorflow as tf
node1 = tf.constant(1)
node2 = tf.constant(2)

Now we perform adding operation which will be the output

node3 = node1 + node2

Now, remember we have to run a TensorFlow session in order to get the output. We will use the ‘with’ command in order to auto-close the session after executing the output.

with tf.Session() as sess:
    result = sess.run(node3)

This is how the TensorFlow graph works.

After a quick overview of the tensor graph, it is essential to know the objects used in a tensor graph. Basically, there are two types of objects used in a tensor graph.

a) Variables

b) Placeholders.

Variables and Placeholders.


During the optimization process, TensorFlow tends to tune the model by taking care of the parameters present in the model. Variables are a part of tensor graphs that are capable of holding the values of weights and biases obtained throughout the session. They need proper initialization, which we will cover throughout the coding session.


Placeholders are also an object of tensor graphs which are typically empty, and they are used to feed in actual training examples. They hold a condition that they require can expected declared data type such as ‘tf. float32’ with an optional shape argument.

Let’s jump into the example to explain these two objects.
First, we import TensorFlow.

import tensorflow as tf

It is always important to run a session when we use TensorFlow. So, we will run an interactive session to perform the further task.

sess = tf.InteractiveSession()

In order to define a variable, we can take some random numbers ranging from 0 to 1 in a 4×4 matrix.

my_tensor = tf.random_uniform((4,4),0,1)
my_variable = tf.Variable(initial_value=my_tensor)

In order to see the variables, we need to initialize a global variable and run it to get the actual variables. Let us do that.

init = tf.global_variables_initializer()

Now sess.run() usually runs a session, and it is time to see the output, i.e., variables

array ([[ 0.18764639, 0.76903498, 0.88519645, 0.89911747],
       [ 0.18354201, 0.63433743, 0.42470503, 0.27359927],
       [ 0.45305872, 0.65249109, 0.74132109, 0.19152677],
       [ 0.60576665, 0.71895587, 0.69150388, 0.33336747]], dtype=float32)

So, these are the variables ranging from 0 to 1 in a shape of 4 by 4
Now it is time to run a simple placeholder.
In order to define and initialize a placeholder, we need to do the following.

Place_h = tf.placeholder(tf.float64)

It is common to use the float64 data type, but we can also use the float32 data type, which is more flexible.

Here we can put ‘None’ or the number of features in shape because ‘None’ can be filled by a number of samples in the data.

Case Study

Now we will be using case studies that will perform both regressions as well as classification.

Regression using Tensorflow

Let us deal with the regression first. In order to perform regression, we will use California Housing data, where we will be predicting the value of the blocks using data such as income, population, number of bedrooms, etc.

Let us jump into the data for a quick overview.

import pandas as pd
housing_data = pd.read_csv('cal_housing_clean.csv')

Let us have a quick summary of the data.


Let us select the features and the target variable in order to perform splitting. Splitting is done for training and testing the model.  We can take 70% for training and the rest for testing.

x_data = housing_data.drop(['medianHouseValue'],axis=1)
y_val = housing_data['medianHouseValue']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test=train_test_split (x_data, y_val,test_size=0.3,random_state=101)

Now scaling is necessary for this type of data as they contain continuous variables.

So, we will apply MinMaxScaler from the sklearn library. We will apply for both training and testing data.

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()

X_train=pd.DataFrame(data=scaler.transform(X_train),columns= X_train.columns,index=X_train.index)
X_test=pd.DataFrame(data=scaler.transform(X_test),columns= X_test.columns,index=X_test.index)

So, from the above commands, the scaling is done. Now, as we are using Tensorflow, it is necessary to convert all the feature columns into continuous numeric columns for the estimators. In order to do that, we use a command called tf.feature_column.

Let us import TensorFlow and assign each operation to a variable.

import tensorflow as tf
house_age = tf.feature_column.numeric_column('housingMedianAge')
total_rooms = tf.feature_column.numeric_column('totalRooms')
population_total= tf.feature_column.numeric_column('population')
households = tf.feature_column.numeric_column('households')
total_income = tf.feature_column.numeric_column('medianIncome')
feature_cols= [house_age,total_rooms, total_bedrooms, population_total, households,total_income]

Now let us create an input function for the estimator object. The parameters such as batch size and epochs can be explored as per our wish as the increase in epochs and batch size tend to increase the accuracy of the model. We will use DNN Regressor to predict California’s house value.

input_function=tf.estimator.inputs.pandas_input_fn(x=X_train,y=y_train ,batch_size=10,num_epochs=1000,shuffle=True)

While fitting the data, we used 3 hidden layers to build the model. We can also increase the layers, but notice, increasing hidden layers can give us an overfitting issue that should be prevented. So, 3 hidden layers are ideal for building a neural network.

Now for prediction, we need to create a predict function and then use it. predict() method, which will create a list of predictions on the test data.

pred_gen =regressor.predict(predict_input_function)

Here pred_gen will be basically a generator that will generate the predictions. In order to look into the predictions, we have to put them on the list.

predictions = list(pred_gen)

Now after the prediction is done, we have to evaluate the model. RMSE or Root Mean Squared Error is a great choice for evaluating regression problems. Let us look into that.

final_preds = []
for pred in predictions:
from sklearn.metrics import mean_squared_error

Now, after we execute, we get an RMSE of 97921.93181985477, which is expected as the units of median house value is the same as RMSE. So here we go. The regression task is over. Now it is time for classification.

Classification using TensorFlow. 

Classification is used for data having classes as target variables. Now we will take California Census data and classify whether a person earns more than 50000 dollars or less depending on data such as education, age, occupation, marital status, gender, etc.

Let us look into the data for an overview.

import pandas as pd
census_data = pd.read_csv("census_data.csv")	

Here we can see many categorical columns that need to be taken care of. On the other hand, the income column, which is the target variable, contains strings. As TensorFlow is unable to understand strings as labels, we have to build a custom function so that it converts strings to binary labels, 0 and 1.

def labels(class):
    if class==' 

There are other ways to do that. But this is considered much easy and interpretable.

We will start splitting the data for training and testing.

from sklearn.model_selection import train_test_split
x_data = census_data.drop('income_bracket',axis=1)
y_labels = census_data ['income_bracket']
X_train, X_test, y_train, y_test=train_test_split(x_data, y_labels,test_size=0.3,random_state=101)

After that, we must take care of the categorical variables and numeric features.

gender_data=tf.feature_column.categorical_column_with_vocabulary_list("gender", ["Female", "Male"])
occupation_data=tf.feature_column.categorical_column_with_hash_bucket("occupation", hash_bucket_size=1000)
marital_status_data=tf.feature_column.categorical_column_with_hash_bucket("marital_status", hash_bucket_size=1000)
relationship_data=tf.feature_column.categorical_column_with_hash_bucket("relationship", hash_bucket_size=1000)
education_data=tf.feature_column.categorical_column_with_hash_bucket("education", hash_bucket_size=1000)
workclass_data=tf.feature_column.categorical_column_with_hash_bucket("workclass", hash_bucket_size=1000)
native_country_data=tf.feature_column.categorical_column_with_hash_bucket("native_country", hash_bucket_size=1000)

Now we will take care of the feature columns containing numeric values.

age_data = tf.feature_column.numeric_column("age")

Now we will combine all these variables and put these into a list.


Now all the preprocessing part is done, and our data is ready. Let us create an input function and fit the model.


Let us train the model for at least 5000 steps.


After the training, it is time to predict the outcome


This will produce a generator that needs to be converted to a list to look into the predictions.

predicted_data = list(classifier.predict(input_fn=pred_fn))

The prediction is done. Now let us take a single test data to look into the predictions.

{'class_ids': array([0], dtype=int64),
 'classes': array([b'0'], dtype=object),
 'logistic': array([ 0.21327116], dtype=float32),
 'logits': array([-1.30531931], dtype=float32),
 'probabilities': array([ 0.78672886,  0.21327116], dtype=float32)}

From the above dictionary, we need only class_ids to compare with the real test data. Let us extract that.

final_predictions = []
for pred in predicted_data:

This will give the first 10 predictions.

[0, 0, 0, 0, 1, 0, 0, 0, 0, 0]

 To make an inference less intuitive, we will evaluate it. 

from sklearn.metrics import classification_report

Now we can look into the metrics such as precision and recall to evaluate how our model performed.

The model performed quite well for those people whose income is less than 50K dollars than those earning more than 50K dollars. That’s it for now. This is how TensorFlow is used when we perform regression and classification.

Saving and Loading a Model

Tensorflow provides a feature to load and save a model. After saving a model, we can be able to execute any piece of code without running the entire code in TensorFlow. Let us illustrate the concept with an example.

We will be using a regression example with some made-up data. For that, let us import all the necessary libraries.

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

Now the regression works on a straight-line equation which is y=mx+b

We will create some made-up data for x and y.

x = np.linspace(0,10,10) + np.random.uniform(-1.5,1.5,10)
array([ 0.04919588,  1.32311387,  0.8076449 ,  2.3478983 ,  5.00027539,
        6.55724614, 6.08756533, 8.95861702, 9.55352047, 9.06981686])
y = np.linspace(0,10,10) + np.random.uniform(-1.5,1.5,10)

Now it is time to plot the data to see whether it is linear or not.


Let us now add the variables, which are the coefficient and the bias.

m = tf.Variable(0.39)
c = tf.Variable(0.2)

Now we have to define a cost function which is nothing but the error in our case.

error = tf.reduce_mean(y - (m*x +c))

Now let us define an optimizer to tune a model and train the model to minimize the error.

train = optimizer.minimize(error)

Now before saving in TensorFlow, we have already discussed that we need to initialize the global variable.

init = tf.global_variables_initializer()

Now let us save the model.

saver = tf.train.Saver()

Now we will use the saver variable to create and run the session.

with tf.Session() as sess:
    epochs = 100
    for i in range(epochs):
    # fetching back the Results
    final_slope , final_intercept = sess.run([m,c])

Now the model is saved to a checkpoint. Now let us evaluate the result.

x_test = np.linspace(-1,11,10)
y_prediction_plot = final_slope*x_test + final_intercept

Now it’s time to load the model. Let us load the model and restore the checkpoint to see whether we get the result or not.

with tf.Session() as sess:
    # For restoring the model
    # Let us fetch back the result
    restore_slope , restore_intercept = sess.run([m,c])

Now let us plot again with the restored parameters.

x_test = np.linspace(-1,11,10)
y_prediction_plot = restore_slope*x_test + restore_intercept

Optimizers an Overview

When we take an interest in building a deep learning model, it is necessary to understand the concept of a parameter called optimizers.  Optimizers help us to reduce the value of the cost function used in the model. The cost function is nothing but the error function which we want to reduce during the model building and largely depends on the model’s internal parameters. For example, every regression equation contains a weight and bias in order to build a model. In these parameters, the optimizers play a crucial role in finding the optimal values to increase the accuracy of the model.

Optimizers generally fall into two categories.

  1. First Order Optimizers
  2. Second Order Optimizers.

First Order Optimizers use a gradient value to deal with their parameters. A gradient value is a function rate that tells us the changing of the target variable with respect to its features. A commonly used first-order optimizer is Gradient Descent Optimizer.

On the other hand, second-order optimizers increase or decrease the loss function by using second-order derivatives. They are much time consuming and take much consuming power compared to first-order optimizers. Hence, less used.

Some of the commonly used optimizers are:

SGD (Stochastic Gradient Descent)

If we have 50000 data points with 10 features, we must compute 50000*10 times on each iteration. So, let us consider 500 iterations for building a model that will take 50000*10*500 computations to complete the process. So, for this huge processing, SGD or stochastic gradient descent comes into play. It generally takes a single data point for an iteration to reduce the computing process and works on the loss functions of the model.


Adam stands for Adaptive Moment Estimation, which estimates the loss function by adopting a unique learning rate for each parameter. The learning rates keep on decreasing on some optimizers due to adding squared gradients, and they tend to decay at some point. Adam optimizers take care of that, and it prevents high variance of the parameter and disappearing learning rates, also known as decay learning rates.


This optimizer is suitable for sparse data as it deals with the learning rates based on the parameters. We do not need to tune the learning rate manually. But it has a demerit of vanishing learning rate because of the gradient accumulation at every iteration.


It is similar to Adagrad as it also uses an average of the gradient on every step of the learning rate. It does not work well on large datasets and violates the rules SGD optimizers use.

Let’s perform these optimizers using Keras. If you are confused, Keras is a subset library provided by TensorFlow, which is used to compute advanced deep learning models. So, you see, everything is linked.

We will be using a logistic regression model which involves only two classes. We will just focus on the optimizers without going deep into the entire model.

Let us import the libraries and set a learning rate

from keras.optimizers import SGD, Adam, Adagrad, RMSprop
dflist = []
optimizers = ['SGD (lr=0.01)',
              'SGD (lr=0.01, momentum=0.3)',
              'SGD (lr=0.01, momentum=0.3, nesterov=True)',  

Now we will compile the learning rates and evaluate

for opt_name in optimizers:
    model = Sequential ()
    model.add(Dense(1, input_shape=(4,), activation='sigmoid'))
    h = model.fit(X_train, y_train, batch_size=16, epochs=5, verbose=0)
    dflist.append(pd.DataFrame(h.history, index=h.epoch))
historydf = pd.concat(dflist, axis=1)
metrics_reported = dflist[0].columns
idx = pd.MultiIndex.from_product([optimizers, metrics_reported],
                                 names=['optimizers', 'metric'])

Now we will plot and look at the performances of the optimizers.

historydf.columns = idx
ax = plt.subplot(211)
historydf.xs('loss', axis=1, level='metric').plot(ylim=(0,1), ax=ax)

If we look at the graph, we can see that the ADAM optimizer performed the best and SGD the worst. It still depends on the data.

ax = plt.subplot(212)
historydf.xs('acc', axis=1, level='metric').plot(ylim=(0,1), ax=ax)

In terms of accuracy, we can also see Adam Optimizer performed the best. This is how we can play around with the optimizers to build the best model.

Difference between RNN & CNN

It is suitable for spatial data such as images. RNN is suitable for temporal data, also called 
sequential data.
CNN is considered to be more powerful than RNN. RNN includes less feature compatibility when 
compared to CNN.
This network takes fixed-size inputs and generates fixed-size outputs. RNN can handle arbitrary input/output lengths.
CNN is a type of feed-forward artificial neural network with variations of multi-layer perceptrons designed to use minimal amounts of preprocessing. RNNs, unlike feed-forward neural networks – can use their internal memory to process arbitrary sequences of inputs.
CNN uses the connectivity pattern between the neurons. This is inspired by the organization of the animal visual cortex, whose individual neurons are arranged in such a way that they respond to overlapping regions tiling the visual field. Recurrent neural networks use time-series information – what a user spoke last would impact what he/she will speak next.
CNN is ideal for images and video processing RNN is ideal for text and speech analysis.

Libraries & Extensions

Tensorflow has the following libraries and extensions to build advanced models or methods. 
1. Model optimization
2. TensorFlow Graphics
3. Tensor2Tensor
4. Lattice
5. TensorFlow Federated
6. Probability
7. TensorFlow Privacy
8. TensorFlow Agents
9. Dopamine
10. TRFL
11. Mesh TensorFlow
12. Ragged Tensors
13. Unicode Ops
14. TensorFlow Ranking
15. Magenta
16. Nucleus
17. Sonnet
18. Neural Structured Learning
19. TensorFlow Addons
20. TensorFlow I/O

What are the Applications of TensorFlow?

  • Google uses Machine Learning in almost all of its products: Google has the most exhaustive database in the world. And they obviously would be more than happy if they could make the best use of this by exploiting it to the fullest. Also, suppose all the different kinds of teams — researchers, programmers, and data scientists — working on artificial intelligence could work using the same set of tools and thereby collaborate with each other. In that case, all their work could be made much simpler and more efficient. As technology developed and our needs widened, such a toolset became a necessity. Motivated by this necessity, Google created TensorFlow- a solution they have long been waiting for.
  • TensorFlow bundles together the study of Machine Learning and algorithms and will use it to enhance the efficiency of its products — by improving its search engine, giving us recommendations, translating to any of the 100+ languages, and more.

What is Machine Learning?

A computer can perform various functions and tasks relying on inference and patterns as opposed to conventional methods like feeding explicit instructions, etc. The computer employs statistical models and algorithms to perform these functions. The study of such algorithms and models is termed Machine Learning.
Deep learning is another term that one has to be familiar with. A subset of Machine Learning, deep learning is a class of algorithms that can extract higher-level features from the raw input. Or in simple words, they are algorithms that teach a machine to learn from examples and previous experiences. 
Deep learning is based on the concept of Artificial Neural Networks, ANN. Developers use TensorFlow to create many multiple-layered neural networks. Artificial Neural Networks (ANN) attempt to mimic the human nervous system to a good extent by using silicon and wires. This system intends to help develop a system that can interpret and solve real-world problems like a human brain

  • It is free and open-sourced: TensorFlow is an Open-Source Software released under the Apache License. An Open Source Software, OSS, is a kind of computer software where the source code is released under a license that enables anyone to access it. This means that the users can use this software library for any purpose — distribute, study and modify — without actually having to worry about paying royalties.
  • When compared to other such Machine Learning Software Libraries — Microsoft’s CNTK or Theano — TensorFlow is relatively easy to use. Thus, even new developers with no significant understanding of machine learning can now access a powerful software library instead of building their models from scratch.
  • Another factor that adds to its popularity is the fact that it is based on graph computation. Graph computation allows the programmer to visualize his/her development with the neural networks. This can be achieved through the use of the Tensor Board. This comes in handy while debugging the program. The Tensor Board is an important feature of TensorFlow as it helps monitor the activities of TensorFlow– both visually and graphically. Also, the programmer is given an option to save the graph for later use.  


Below are listed a few of the use cases of TensorFlow:

  • Voice and speech recognition: The real challenge put before programmers were that mere words would not be enough. Since words change meaning with context, a clear understanding of what the word represents with respect to the context is necessary. This is where deep learning plays a significant role. With the help of Artificial Neural Networks (ANNs), such an act has been made possible by performing word recognition, phoneme classification, etc.

Thus with the help of TensorFlow, artificial intelligence-enabled machines can now be trained to receive human voice as input, decipher and analyze it, and perform the necessary tasks. A number of applications make use of this feature. They need this feature for voice search, automatic dictation, and more.
Let us take the case of Google’s search engine as an example. While using Google’s search engine, applies machine learning using TensorFlow to predict the next word you are about to type. Considering the fact that how accurate they often are, one can understand the level of sophistication and complexity involved in the process.

  • Image recognition: Apps that use image recognition technology probably popularize deep learning among the masses. The technology was developed with the intention to train and develop computers to see, identify, and analyze the world like how a human would.  Today, a number of applications find these useful — the artificial intelligence-enabled camera on your mobile phone, the social networking sites you visit, and your telecom operators, to name a few.[optin-monster-shortcode id=”ehbz4ezofvc5zq0yt2qj”]

In image recognition, Deep Learning trains the system to identify a certain image by exposing it to several images labeled manually. It is to be noted that the system learns to identify an image by learning from previously shown examples and not with the help of instructions saved in it on how to identify that particular image.
Take the case of Facebook’s image recognition system, DeepFace. It was trained in a similar way to identify human faces. When you tag someone in a photo that you have uploaded on Facebook, this technology is what makes it possible. 
Another commendable development is in the field of Medical Science. Deep learning has made great progress in the field of healthcare — especially in the field of Ophthalmology and Digital Pathology. By developing a state-of-the-art computer vision system, Google was able to develop computer-aided diagnostic screening that could detect certain medical conditions that would otherwise have required a diagnosis from an expert. Even with significant expertise in the area, considering the tedious work one has to go through, the diagnosis varies from person to person. Also, in some cases, the condition might be too dormant to be detected by a medical practitioner. Such an occasion won’t arise here because the computer is designed to detect complex patterns that may not be visible to a human observer.    
TensorFlow is required for deep learning to use image recognition efficiently. The main advantage of using TensorFlow is that it helps to identify and categorize arbitrary objects within a larger image. This is also used for the purpose of identifying shapes for modeling purposes. 

  • Time series: The most common application of Time Series is in Recommendations. If you are someone using Facebook, YouTube, Netflix, or any other entertainment platform, then you may be familiar with this concept. For those who do not know, it is a list of videos or articles that the service provider believes suits you the best. TensorFlow Time Services algorithms are what they use to derive meaningful statistics from your history.

Another example is how PayPal uses the TensorFlow framework to detect fraud and offer secure transactions to its customers. PayPal has successfully been able to identify complex fraud patterns and has increased its fraud decline accuracy with the help of TensorFlow. The increased precision in identification has enabled the company to offer an enhanced experience to its customers. 

A Way Forward

With the help of TensorFlow, Machine Learning has already surpassed the heights that we once thought to be unattainable. There is hardly a domain in our life where a technology built with this framework’s help has no impact.
 From the healthcare to the entertainment industry, the applications of TensorFlow have widened the scope of artificial intelligence in every direction in order to enhance our experiences. Since TensorFlow is an Open-Source Software library, it is just a matter of time for new and innovative use cases to catch the headlines.

  • What is TensorFlow used for?

TensorFlow is a software tool for Deep Learning. It is an artificial intelligence library that allows developers to create large-scale multi-layered neural networks. It is used in Classification, Recognition, Perception, Discovering, Prediction, Creation, etc. Some of the primary use cases are Sound Recognition, Image recognition, etc.

  • What language is used for TensorFlow?

TensorFlow has support for API in several languages. The most widely used is Python. This is because it is the most complete and easiest to use. The other languages, like C++, Java, etc., are not covered by API stability promises. 

  • Do you need math for TensorFlow?

If you are trying to add or implement new features, the answer is yes. Writing the code in TensorFlow does not require any math. The math that is required is Linear algebra and Statistics. If you know the basics of this, then you can easily go ahead with implementation.  

If you know Deep Learning, machine learning, and programming languages like Python and C++, then Basic TensorFlow can be learned in 1-2 months. It is quite complex and might discourage you from pursuing it, but that makes it very powerful. It might take 1-2 years to master TensorFlow. 

  • Where is TensorFlow mostly used?

TensorFlow is mostly used in Voice/Sound Recognition, text-based applications that work on sentiment analysis, Image Recognition Video Detection, etc. 

  • Why is TensorFlow written in Python?

Tensorflow is written in Python because it is the most complete and easiest when it comes to TensorFlow API. It provides convenient ways to implement high-level abstractions that can be coupled together. Also, nodes and tensors in TensorFlow are Python objects, and the applications are themselves python applications. 

  • Is TensorFlow good for beginners?

If you have a good understanding of Machine learning, deep learning, and programming languages like Python, then as a beginner, Tensorflow basics can be learned in 1-2 months. It is difficult to master it in a short time as it is very powerful and complex. 

  • What is TensorFlow written in?

Although TensorFlow has nodes and tensors in Python, the core TensorFlow is written in CUDA(Nvidia’s GPU Programming Language) and highly optimized C++ language. 

  • Why is TensorFlow so popular?

TensorFlow is a very powerful framework that provides many functionalities and services compared to other frameworks. These high-level functionalities help advance parallel computation and build complex neural network models. Hence, it is very popular.

Source : https://www.mygreatlearning.com/blog/artificial-intelligence/