Part 3: How to Analyze a Database File with GPT-3.5
In part 3 of this blog series, we'll explore the proper usage of data analysis with ChatGPT-3.5 and how you can analyze and visualize data from a SQLite database to help you make the most of your data.
Suppose we have a database file for a gold broker company that sells gold and we want to analyze the total monthly sales for each product. What do we do and what is the logical question to understand that gold.db?
Tools You Will Need!
The first step:
Open the gold.db file in the database browser (SQLite DB Browser) and copy the database schema information:
CREATE TABLE Customer ( CustomerID INTEGER PRIMARY KEY AUTOINCREMENT, FirstName TEXT, LastName TEXT, Email TEXT, Phone TEXT, Address TEXT, RegistrationDate DATE );
CREATE TABLE OrderDetails ( OrderDetailID INTEGER PRIMARY KEY AUTOINCREMENT, OrderID INTEGER, ProductID INTEGER, Quantity INTEGER, PriceAtTimeOfPurchase REAL, FOREIGN KEY (OrderID) REFERENCES Orders(OrderID);
Updated reference FOREIGN KEY (ProductID) REFERENCES Product(ProductID) ); CREATE TABLE Orders ( OrderID INTEGER PRIMARY KEY AUTOINCREMENT, CustomerID INTEGER, OrderDate DATE, Status TEXT, FOREIGN KEY (CustomerID);
REFERENCES Customer(CustomerID) ); CREATE TABLE Product ( ProductID INTEGER PRIMARY KEY AUTOINCREMENT, ProductName TEXT, MetalType TEXT, Form TEXT, Weight REAL, Purity REAL, StockQuantity INTEGER, PricePerUnit REAL );
CREATE TABLE Supplier ( SupplierID INTEGER PRIMARY KEY AUTOINCREMENT, SupplierName TEXT, Contact TEXT, Email TEXT, Phone TEXT ); CREATE TABLE Supply ( SupplyID INTEGER PRIMARY KEY AUTOINCREMENT, SupplierID INTEGER, ProductID INTEGER, SupplyDate DATE, Quantity INTEGER, PricePerUnit REAL, FOREIGN KEY (SupplierID) REFERENCES Supplier(SupplierID), FOREIGN KEY (ProductID) REFERENCES Product(ProductID);
The second step:
Write the following question in ChatGPT!
You are a data engineer and have a sqlite3 database with the following schema. You should write a python script that you can run in a Jupyter Notebook that should draw a diagram. The data in the database comes from sales of a gold broker company. Build a chart that shows total sales of products by month as well as group sales by product types.
Database schema:
CREATE TABLE Customer ( CustomerID INTEGER PRIMARY KEY AUTOINCREMENT, FirstName TEXT, LastName TEXT, Email TEXT, Phone TEXT, Address TEXT, RegistrationDate DATE ); CREATE TABLE OrderDetails ( OrderDetailID INTEGER PRIMARY KEY AUTOINCREMENT, OrderID INTEGER, ProductID INTEGER, Quantity INTEGER, PriceAtTimeOfPurchase REAL, FOREIGN KEY (OrderID) REFERENCES Orders(OrderID), -- Updated reference FOREIGN KEY (ProductID) REFERENCES Product(ProductID) ); CREATE TABLE Orders ( OrderID INTEGER PRIMARY KEY AUTOINCREMENT, CustomerID INTEGER, OrderDate DATE, Status TEXT, FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID) ); CREATE TABLE Product ( ProductID INTEGER PRIMARY KEY AUTOINCREMENT, ProductName TEXT, MetalType TEXT, Form TEXT, Weight REAL, Purity REAL, StockQuantity INTEGER, PricePerUnit REAL ); CREATE TABLE Supplier ( SupplierID INTEGER PRIMARY KEY AUTOINCREMENT, SupplierName TEXT, Contact TEXT, Email TEXT, Phone TEXT ); CREATE TABLE Supply ( SupplyID INTEGER PRIMARY KEY AUTOINCREMENT, SupplierID INTEGER, ProductID INTEGER, SupplyDate DATE, Quantity INTEGER, PricePerUnit REAL, FOREIGN KEY (SupplierID) REFERENCES Supplier(SupplierID), FOREIGN KEY (ProductID) REFERENCES Product(ProductID).
The answer will be as follows in ChatGPT (code):
import sqlite3 import pandas as pd import matplotlib.pyplot as plt # Step 1: Connect to the SQLite database db_connection = sqlite3.connect('your_database.db') # Replace 'your_database.db' with the actual database file path # Step 2: Query the database to retrieve the required data query = """ SELECT strftime('%Y-%m', o.OrderDate) AS Month, p.MetalType AS ProductType, SUM(od.Quantity * od.PriceAtTimeOfPurchase) AS TotalSales FROM Orders o JOIN OrderDetails od ON o.OrderID = od.OrderID JOIN Product p ON od.ProductID = p.ProductID GROUP BY Month, ProductType ORDER BY Month """ # Execute the query and fetch the data into a Pandas DataFrame df = pd.read_sql_query(query, db_connection) # Step 3: Pivot the data for plotting pivot_df = df.pivot(index='Month', columns='ProductType', values='TotalSales') pivot_df.fillna(0, inplace=True) # Step 4: Create the line chart using Matplotlib plt.figure(figsize=(12, 6)) for product_type in pivot_df.columns: plt.plot(pivot_df.index, pivot_df[product_type], marker='o', linestyle='-', label=product_type) plt.xlabel('sales Month and year') plt.ylabel('Total Sales') plt.title('Total Product Sales by Month') plt.legend(loc='upper left') plt.grid(True) # Rotate x-axis labels for better readability plt.xticks(rotation=90) plt.show() # Close the database connection db_connection.close()
The third step:
Copy the code as it is, but you have to change the path to the gold.db file inside the brackets.
Conclusion!
ChatGPT can be a game-changer in the field of data analysis, offering a valuable resource for generating insights, explanations, and visualizations. By leveraging this advanced AI model, you can streamline your data analysis process, enhance your decision-making, and unlock the potential of your data.
Remember that while ChatGPT can assist with data analysis, it's most effective when used in conjunction with human expertise and a comprehensive understanding of your data. With the right approach, ChatGPT can be a powerful ally in your quest for data-driven insights.