Graph API | Technology Blog - Python, Graph API and Sharepoint

Neural Network Regression Problem with TensorFlow(Part 2)

Sumit Dey
Feb 24, 2022
5 min read

Continue after Part 1

Create a little bit of a bigger dataset and a new model

we'll create a little bit of a bigger dataset(using NumPy) and create new models to compares the predictions.

Let's create the bigger Input Dataset


import tensorflow as tf
import numpy as np
# Make a bigger dataset
X = np.arange(-100, 100, 4)
X

Bigger Output Dataset


# Make labels for the dataset (adhering to the same pattern as before)
y = np.arange(-90, 110, 4)
y

If we add y=X+10, we can make the labels like so:

# Same result as above
y = X + 10
y

Split Dataset into training/test dataset

One of the other most common and important steps in a machine learning project is creating a training and test set (and when required, a validation set) and each set is required for a different purpose

Training set - the model learns from this data, which is typically 70-80% of the total data available (like the course materials you study during the semester).
Validation set - the model gets tuned on this data, which is typically 10-15% of the total data available (like the practice exam you take before the final exam).
Test set - the model gets evaluated on this data to test what it has learned, it's typically 10-15% of the total data available (like the final exam you take at the end of the semester).

Now we need to create Training and Validation dataset by splitting X and y arrays.

# Check how many samples we have
len(X), len(y)

# Split data into train and test sets
X_train = X[:40] # first 40 examples (80% of data)
y_train = y[:40]
X_test = X[40:] # last 10 examples (20% of data)
y_test = y[40:]

len(X_train), len(X_test)

Visualizing the data

Now we have to visualize our data, let's plot a nice colorful plot to visualize the data. using matpotlib.pyplot library to generate a plot

import matplotlib.pyplot as plt
plt.figure(figsize=(10, 7))
# Plot training data in blue
plt.scatter(X_train, y_train, c='b', label='Training data')
# Plot test data in green
plt.scatter(X_test, y_test, c='g', label='Testing data')
# Show the legend
plt.legend();

Anytime we can visualize data, model, prediction etc. are a good idea.

Time to build model

With this graph in mind, what we'll be trying to do is build a model which learns the pattern in the blue dots (X_train) to draw the green dots (X_test).

# Set random seed
tf.random.set_seed(42)

# Create a model 
model = tf.keras.Sequential([
   tf.keras.layers.Dense(1, input_shape=[1]) 
   # define the input_shape to our model
])

# Compile model (same as above)  
model.compile(loss=tf.keras.losses.mae,
              optimizer=tf.keras.optimizers.SGD(),
              metrics=["mae"])
# Fit the model to the training data
model.fit(X_train, y_train, epochs=100, verbose=0) 
# verbose controls how much gets output

Visualizing the predictions

Now we have the trained model, visualize some predictions. To visualize predictions, it's always a good idea to plot them against the ground truth labels.

Often you'll see this in the form of y_test vs. y_pred (ground truth vs. predictions).

First, we'll make some predictions on the test data (X_test), remember the model has never seen the test data.

# Make predictions
y_preds = model.predict(X_test)
# View the predictions
y_preds

Let's create a plotting function to visualize the data

def plot_predictions(train_data=X_train,
                     train_labels=y_train,
                     test_data=X_test,
                     test_labels=y_test,
                     predictions=y_preds):
"""  
Plots training data, test data and compares predictions.  
"""
plt.figure(figsize=(10, 7))
# Plot training data in blue
plt.scatter(train_data, train_labels, c="b", label="Training data")
# Plot test data in green
plt.scatter(test_data, test_labels, c="g", label="Testing data")
# Plot the predictions in red (predictions were made on the test data)
plt.scatter(test_data, predictions, c="r", label="Predictions")
# Show the legend
plt.legend();

Now we are getting plot using this function

plot_predictions(train_data=X_train,
                 train_labels=y_train,
                 test_data=X_test,
                 test_labels=y_test,
                 predictions=y_preds)

We can see, our predictions aren't totally correct, we can do more experiments to improve this result.

Running experiments to improve a model

There are many ways to improve your model, please find the most common way to do it

Get more data - get more examples for your model to train on (more opportunities to learn patterns).
Make your model larger (use a more complex model) - this might come in the form of more layers or more hidden units in each layer.
Train for longer - give your model more of a chance to find the patterns in the data.

In a real-world situation, we would not be able to change data, let's experiment with the 2 and 3. Let's create three models with the following scenarios

model_1 - same as the original model, trained for 100 epochs.
model_2 - 2 layers, trained for 100 epochs.
model_3 - 2 layers, trained for 500 epochs.

Build model_1

# Set random seed
tf.random.set_seed(42)

# Replicate original model 
model_1 = tf.keras.Sequential([
  tf.keras.layers.Dense(1)
])

# Compile model 
model_1.compile(loss=tf.keras.losses.mae,
              optimizer=tf.keras.optimizers.SGD(),
              metrics=["mae"])
              
# Fit the model to the training data
model_1.fit(tf.expand_dims(X_train, axis=-1), y_train, epochs=100)

-----

Let's make predictions for model_1

# Make and plot predictions for model_1
y_preds_1 = model_1.predict(X_test)
plot_predictions(predictions=y_preds_1)

Not much improvement from the previous model, let's build model_2

Build model_2

This time we are adding an extra dense layer, keeping everything else the same

# Set random seed
tf.random.set_seed(42)

# Replicate model_1 and add an extra layer 
model_2 = tf.keras.Sequential([
  tf.keras.layers.Dense(1),
  tf.keras.layers.Dense(1) # add a second layer,
])
# Compile model 
model_2.compile(loss=tf.keras.losses.mae,
              optimizer=tf.keras.optimizers.SGD(),
              metrics=["mae"])
              
# Fit the model to the training data
model_2.fit(tf.expand_dims(X_train, axis=-1), y_train, epochs=100, verbose=0) # set verbose to 0 for less output

Let's predict for the model_2

# Make and plot predictions for model_2
y_preds_2 = model_2.predict(X_test)
plot_predictions(predictions=y_preds_2)

It's looking far better than model_1 after adding an extra layer, let's try to build the third model, everything keeps the same as model_2 except train for longer(500 instead of 100)

Build model_3

Train this model for 500 epochs

# Set random seed
tf.random.set_seed(42)

# Replicate model_2 
model_3 = tf.keras.Sequential([
  tf.keras.layers.Dense(1),
  tf.keras.layers.Dense(1) # add a second layer,
])
# Compile model 
model_3.compile(loss=tf.keras.losses.mae,
              optimizer=tf.keras.optimizers.SGD(),
              metrics=["mae"])
              
# Fit the model to the training data
model_3.fit(tf.expand_dims(X_train, axis=-1), y_train, epochs=500, verbose=0) # set verbose to 0 for less output

Let's predict the model_3

# Make and plot predictions for model_3
y_preds_3 = model_3.predict(X_test)
plot_predictions(predictions=y_preds_3)

It is really strange, model performance has been worse when trained for longer. As it turns out that when you train a model too long, the result can be worse.

Comparing the results

Now we got results of three similar models, we need to compare the results.

Before we compare the three models, create two functions to get the mean absolute error and mean squared error value for test data and predictions.

def mae(y_test, y_pred):
  """  Calculuates mean absolute error between y_test and y_preds.              
  """
  return tf.metrics.mean_absolute_error(y_test,
                                        y_pred)
                                        
 def mse(y_test, y_pred):
   """  
   Calculates mean squared error between y_test and y_preds.  
   """
   return tf.metrics.mean_squared_error(y_test,
                                        y_pred)

Now we are calculating mae and mse for all three models

# Calculate model_1 metrics
mae_1 = mae(y_test, y_preds_1.squeeze()).numpy()
mse_1 = mse(y_test, y_preds_1.squeeze()).numpy()
mae_1, mse_1

# Calculate model_2 metrics
mae_2 = mae(y_test, y_preds_2.squeeze()).numpy()
mse_2 = mse(y_test, y_preds_2.squeeze()).numpy()
mae_2, mse_2

# Calculate model_3 metrics
mae_3 = mae(y_test, y_preds_3.squeeze()).numpy()
mse_3 = mse(y_test, y_preds_3.squeeze()).numpy()
mae_3, mse_3

Let's compare the model

import pandas as pd
model_results = [["model_1", mae_1, mse_1],
                 ["model_2", mae_2, mse_2],
                 ["model_3", mae_3, mae_3]]
all_results = pd.DataFrame(model_results, columns=["model", "mae", "mse"])
all_results

The results of our experiment are that model_2 performance is the best out of the three models. Comparing models is tedious, in this case, we are comparing three models, in the real-world scenario we may need to compare more models.

But this is part of what machine learning modeling is about, trying many different combinations of models and seeing which performs best.

Another thing you'll also find is what you thought may work (such as training a model for longer) may not always work and the exact opposite is also often the case.

Reference: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition

Neural Network Regression Problem with TensorFlow(Part 1)

Sumit Dey
Feb 23, 2022
4 min read

Updated: Feb 24, 2022

What is TensorFlow?

TensorFlow is a free and open-source end-to-end machine learning library for preprocessing data, modeling data, and serving models. The core open-source library to help you develop and train ML models. Get started quickly by running Colab notebooks directly in your browser.

Why use Tensorflow?

It would be the jump start of your machine learning and deep learning career. Rather than building machine learning and deep learning models from scratch, it's more likely you'll use a library such as TensorFlow. This is because it contains many of the most common machine learning functions you'll want to use.

What is Regression Problem

There are many definitions of the regression problem, however, we would simplify the process to predicting a number, we might want to do the following

For example,

Predict the selling price of a house with the given parameters(rooms, house size, number of bathrooms, etc.)
Predict the medical insurance price for an individual person using their demographics(age, sex, medical condition, etc.)

We're going to set the foundation for how we can take a sample input dataset, build a neural network and discover patterns of these inputs and then make a prediction(number format) based on those inputs.

The architecture of a regression neural network(Typical)

Why typical? There is an infinite number of ways you can write a neural network, however, the following is a generic setup for ingesting a collection of numbers, fining patterns in them and output would be some target number.

Hyperparameter Typical value

Input layer shape The same shape as number of features (e.g. 3 for # bedrooms, # bathrooms, # car spaces in housing price prediction)

Hidden layer(s) Problem specific, minimum = 1, maximum = unlimited

Neurons per hidden layer Problem specific, generally 10 to 100

Output layer shape The same shape as desired prediction shape (e.g. 1 for house price)

Hidden activation Usually ReLU (rectified linear unit)

Output activation None, ReLU, logistic/tanh

Loss function MSE (mean square error) or MAE (mean absolute error)/Huber (combination of MAE/MSE) if outliers.

Optimizer SGD (stochastic gradient descent), Adam

Creating data to view and fit

We can start our coding now.

Since we're working on a regression problem (predicting a number) let's create some linear data (a straight line) to model.

import numpy as np
import matplotlib.pyplot as plt

# Create features
X = np.array([-7.0, -4.0, -1.0, 2.0, 5.0, 8.0, 11.0, 14.0])

# Create labels
y = np.array([3.0, 6.0, 9.0, 12.0, 15.0, 18.0, 21.0, 24.0])

# Visualize it
plt.scatter(X, y);

Our goal is to use X as input and predict output y.

In fact, you need to spend the most time on when you work with neural networks: making sure your input and outputs are in the correct shape.

Modeling steps with TensorFlow

Now we know the input and output of the data, let's build a neural network to model it

In TensorFlow, there are typically 3 fundamental steps to creating and training a model.

Creating a model - piece together the layers of a neural network yourself (using the Functional or Sequential API) or import a previously built model (known as transfer learning).
Compiling a model - defining how a models performance should be measured (loss/metrics) as well as defining how it should improve (optimizer).
Fitting a model - letting the model try to find patterns in the data (how does X get to y).

We would build a model using the Keras Sequential API for our regression data, and here are the steps

 import tensorflow as tf
 # Set random seed
 tf.random.set_seed(42)
 
 # Create a model using the Sequential API
 model = tf.keras.Sequential([
  tf.keras.layers.Dense(1)
 ])
 
 # Compile the model
 model.compile(loss=tf.keras.losses.mae,# mae is short for mean abs error 
               # SGD is short for stochastic gradient descent
               optimizer=tf.keras.optimizers.SGD(), 
               metrics=["mae"])
 
 # Fit the model
 # model.fit(X, y, epochs=5) 
 # this will break with TensorFlow 2.7.0+
 model.fit(tf.expand_dims(X, axis=-1), y, epochs=5)

What do you think the outcome should be if we passed our model with an X value of 17.0?

# Make a prediction with the model
model.predict([17.0])

Now y=12.716..., It doesn't go well, it should be something close to 27.0.

How to improve a Model

Common ways to improve a deep model:

Adding Layers
Increase the number of hidden units
Change the activation function
Change the optimization function
Change the learning rate
Fitting on more data
Fitting for longer

First step: let's keep it simple, all we'll do is train our model for longer (everything else will stay the same).

 import tensorflow as tf
 # Set random seed
 tf.random.set_seed(42)
 
 # Create a model using the Sequential API
 model = tf.keras.Sequential([
  tf.keras.layers.Dense(1)
 ])
 
 # Compile the model
 model.compile(loss=tf.keras.losses.mae,# mae is short for mean abs error 
               # SGD is short for stochastic gradient descent
               optimizer=tf.keras.optimizers.SGD(), 
               metrics=["mae"])
 
 # Fit the model
 # model.fit(X, y, epochs=5) 
 # this will break with TensorFlow 2.7.0+
 model.fit(tf.expand_dims(X, axis=-1), y, epochs=100) # train for 100 epochs

........

How about we try to predict 17.0 again?

# Try and predict what y would be if X was 17.0
model.predict([17.0]) # the right answer is 27.0 (y = X + 10)

Now y=30.158..., much better! Now it is closer to 27.

In my next post, we would work on a bigger dataset to predict the model and provide more ways to improve the deep model.

Reference: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition

Continue Part 2....

Upload File(s) into Sharepoint/Office365 using Azure Graph API and Python

Sumit Dey
Feb 9, 2022
2 min read

Updated: Feb 24, 2022

This post is specially designed for uploading the file(s) into the SharePoint document library using Graph API and Python. Here is the proposed architecture

There is no clear documentation from Microsoft to upload documents into Sharepoint sites using python script. Here are steps for uploading documents into the Sharepoint site

1) Register app in Azure Portal

a) First, need to register the app in Azure Portal.

b) While creating the app, please copy the client id and secret for future reference.

c) Please configure the implicit grant and hybrid flows

d) Please provide API permission for files.Readwrite.All and sites.ReadWrite.All

e) Define the scope of the API

f) In the Manifest file, please make sure the following configuration would be true

2) Python script with Graph API to upload files into Sharepoint site.

a) Install Python package Frist needs to install Microsoft Authentication Library (MSAL) for Python

pip install msal

b) Find out site Id and Document Library ID using Graph explorer

Get site ID using Graph explorer and query the following URL(GET method)

https://graph.microsoft.com/v1.0/sites/<tenant name>.sharepoint.com:/sites/<Site Name> i.e. https://graph.microsoft.com/v1.0/sites/sss.sharepoint.com:/sites/pythonsite

Get Sharepoint Document Library ID using Graph explorer and query the following URL(GET method)

https://graph.microsoft.com/v1.0/sites/<site id>/drives

site id - Please use the site id from the previous query

c) Get the access token using the client id, client secret, and tenant id

Here is the python code to get access token import msal

import json

import requests

import os

client_id = '<client id>'

client_secret = '<Client Secret>'

tenant_id = '<Tenant id>'

authority = f"https://login.microsoftonline.com/{tenant_id}"

app = msal.ConfidentialClientApplication(

client_id=client_id,

client_credential=client_secret,

authority=authority)

scopes = ["https://graph.microsoft.com/.default"]

result = None

result = app.acquire_token_silent(scopes, account=None)

if not result: # No access token exists

print(

"No suitable token exists in cache. Let's get a new one from Azure Active

Directory.")

result = app.acquire_token_for_client(scopes=scopes)

if "access_token" in result: # access token exists

print("Access token is " + result["access_token"])

d) Upload files into Sharepoint server

There are two processes to upload files into Sharepoint sites, according to the Microsoft documentation, need to create a session if the file is more than 4MB. i) Upto 4MB file

ii) More than 4MB file

filename = 'TestUpload.docx'

filepath = '/python/'

file_size = os.path.getsize(filepath+filename)

if (file_size <= 4194304): # Less than 4 MB file

endpoint = f'https://graph.microsoft.com/v1.0/sites/{site_id}/drives/{list_id}/root:/{filename}:/content'

data = open(filepath+filename, 'rb').read()

requests.put(endpoint, data=data,

headers={'Authorization': 'Bearer ' + result['access_token'],

'Content-Type':'application/binary'},)

else: # More than 4 MB file

endpoint = f'https://graph.microsoft.com/v1.0/sites/{site_id}/drives/{list_id}/root:/{filename}:/createUploadSession'

graph_data = requests.post(endpoint,

headers={'Authorization': 'Bearer ' + result['access_token']},).json()

print("Upload URL: %s" % graph_data['uploadUrl'])

data = open(filepath+filename, 'rb').read()

content_range = 'bytes 0-'+str(file_size-1)+'/'+ str(file_size)

f_data = requests.put(graph_data['uploadUrl'], data=data,

headers={'Authorization': 'Bearer ' + result['access_token'],

'Content-Range': content_range,

'Content-Type':'application/binary'},).json()

e) Update Metadata

# Update Metadata

requestURL = f'https://graph.microsoft.com/v1.0/sites/{site_id}/drives/{list_id}/root:/{filename}:/listItem/fields'

metadata = {

"Title": "Test Title",

"DocumentType": "Word Document" # Custom metadata

}

jsondata = json.dumps(metadata)

graph_data_metadata = requests.patch(requestURL, data=jsondata,

headers={'Authorization': 'Bearer ' + result['access_token'],

'Content-Type':'application/json'},).json()

Please email(sumitdeyonline@gmail.com) me if you have any issues or concerns.

Technology Blog - Python - Graph API and SharePoint