ODEV
Back
Back | BLOG

May 31, 2024

Generative AI as an Assistant in Database Design

The advent of AI, particularly LLMs (Large Language Models), has undoubtedly revolutionized the software world, transforming the development process by providing advanced tools and capabilities that enhance efficiency, quality, and collaboration at all stages of the software development lifecycle.

snomed-ct

Today, generative AI can generate source code from a list of requirements or descriptions given in natural language, act as programming assistants by providing suggestions and corrections as the developer writes code, and generate documentation from the source code, thereby improving code comprehension and collaboration among developers. These capabilities help improve productivity, quality, and accelerate the software development process.

In this article, I aim to review how generative AI behaves when it comes to software modeling, specifically database modeling using ERM diagrams. I will evaluate its capabilities and limitations to determine if it is helpful in the early stages of software construction. To do this, I posed a series of requirements to ChatGPT, asked it to create an entity-relationship model (ERM), and subsequently analyzed the resulting representation.

juan-ignacio-odev

Juan Ignacio

Senior Software Engineer

clock

17 minutes read

Have a project?

Tell us more
arrow-upper-right-blackarrow-upper-right-white

Reviewing Some Theoretical Concepts

Some topics I intend to discuss bring to mind concepts, definitions, or strategies that I learned during my student days and that help me address the subject. Reviewing old university notes from the Database course, but still valid and to be taken into account when modeling a problem, these indicate that to ensure the quality of conceptual schemas, designs must meet a series of properties:

item

Completeness:

All relevant features of the problem are represented.

item

Correctness:

Syntactic: The model represents the concepts using appropriate specifications given by the language.

Semantic: The model represents the problem.

item

Minimality:

An element of reality is represented only once in the schema.

item

Expressiveness:

Represents reality in a natural and understandable form using only the semantics of the model.

item

Explicitness:

Does not use more formalisms than the E-R diagram.

A good design should maximize Completeness and Correctness and find a balance between Expressiveness and Explicitness.

In the next section, I present a series of case studies where I asked ChatGPT to model a set of requirements to verify the quality of the result by analyzing whether it meets each of the mentioned points: Completeness, Correctness, Minimality, Expressiveness, and Explicitness.

Case Studies

Case 1

Requirements:

A construction company, based on the design of its projects, creates material requirements documents. Each requirement generates more than one purchase order. It wants to model what requirements require what materials, originating what orders.

ChatGPT Solution

item

Entities:

Project

Material

Requirement

Purchase Order

Attributes:

item

Project:

ProjectID (Primary Key)

ProjectName

ProjectDescription

item

Material:

MaterialID (Primary Key)

MaterialName

MaterialDescription

item

Requirement:

RequirementID (Primary Key)

ProjectID (Foreign Key referencing Project)

MaterialID (Foreign Key referencing Material)

Quantity

RequirementDescription

item

Purchase Order:

OrderID (Primary Key)

RequirementID (Foreign Key referencing Requirement)

OrderDate

DeliveryDate

OrderStatus

Relationships:

item

One-to-Many Relationship between Project and Requirement:

A Project can have multiple Requirements.

A Requirement belongs to only one Project.

item

Many-to-Many Relationship between Material and Requirement:

A Material can be required by multiple Requirements.

A Requirement can require multiple Materials.

item

One-to-Many Relationship between Requirement and Purchase Order:

A Requirement can generate multiple Purchase Orders.

A Purchase Order is generated from only one Requirement.


Next, I asked to add integrity constraints to the previous model, and ChatGPT's response was as follows:

item

Entity Integrity:

Each entity must have a unique identifier (primary key) that uniquely identifies each instance of the entity.

Example: ProjectID for Project, MaterialID for Material, RequirementID for Requirement, and OrderID for Purchase Order.

item

Referential Integrity:

Foreign key values must match the primary key values in the referenced table or be null.

Example: ProjectID in Requirement must exist in Project, MaterialID in Requirement must exist in Material, and RequirementID in Purchase Order must exist in Requirement.

item

Domain Integrity:

Attributes should have appropriate data types and constraints to ensure data accuracy.

Example: Quantity in Requirement should be a positive integer, OrderDate and DeliveryDate in Purchase Order should be valid dates, etc.

item

Business Rules:

Define and enforce any specific business rules related to the relationships between entities.

Example: Ensure that a Purchase Order can only be generated for a Requirement that belongs to the same Project.

item

Cascade Update/Delete:

Define whether updates or deletions to primary key values cascade to related foreign key values.

Example: If a ProjectID is updated in Project, corresponding Requirement records should also be updated with the new ProjectID.

item

Constraints on Order Status:

Define constraints on the possible values for OrderStatus to ensure consistency and validity.

Example: OrderStatus can only have values like "Pending", "In Progress", "Completed", etc.

Representation Analysis

While the design models the presented reality, I will now analyze its quality.

Completeness: All relevant entities of the problem and their relationships are represented, so we can say that the design is complete.

Correctness: The entire problem is represented in the schema, although it adds attributes to the entities that are not required. Ignoring this last point, we can say it is semantically correct, but the notation used is descriptive and not represented through an ERM language. In fact, it has many limitations to show the design graphically, so we cannot say that the model is syntactically correct.

Expressiveness and Explicitness: The result is understandable if we ask for a textual description of the model since the graphical representation turns out to be very confusing as it has many shortcomings and errors, not making clear which entities are related, what their cardinality is, totality, etc. Thus, we cannot say that the model meets the properties of expressiveness and explicitness.

Minimality: Each element of the problem is uniquely represented, so we can say that the model is minimal.

All problems can have multiple solutions, and all can be correct and of quality considering the above points. Although the model presented by ChatGPT is correct, it has some details that were not considered. For example, this representation allows for "Material - Requirement" pairs without an associated "Purchase Order." This could be corrected in several ways, for example:

item

Using multiple relationships:

snomed-ct
item

Using Aggregation and Totality:

snomed-ct

Case 2

Requirements:

Each mechanic repairs several cars. Each mechanic uses a certain tool depending on the car he works on.

ChatGPT Solution

item

Entities:

Mechanic

Car

Tool

Repairs

Uses

Attributes:

item

Mechanic:

MechanicID (PK)

Name

Experience Level

Contact Info

item

Car:

CarID (PK)

Make

Model

Year

Owner Info

item

Tool:

ToolID (PK)

ToolName

ToolType

Usage Instructions

item

Repairs:

MechanicID (FK)

CarID (FK)

RepairDate

RepairDetails

item

Uses:

MechanicID (FK)

ToolID (FK)

CarID (FK)

UsageDate

SpecificUsageDetails

Relationships:

item

Many-to-Many Relationship between Car and Mechanic:

Each Mechanic can repair several Cars.

Each Car can have several Mechanics working on it.

item

Many-to-Many Relationship between Car and Tool:

Several Tools can be used in each Car.

Each Tool can be used on several Cars.


Next, I asked to add integrity constraints to the previous model, and ChatGPT's response was as follows:

item

Entity Integrity:

Each entity must have a unique identifier (primary key) that uniquely identifies each instance of the entity.

Example: MechanicID for Mechanic, CarID for Car, ToolID for Tool.

item

Referential Integrity:

Repairs: MechanicID (Foreign Key) must exist in Mechanic table - CarID (Foreign Key) must exist in Car table.

Uses: MechanicID (Foreign Key) must exist in Mechanic table - ToolID (Foreign Key) must exist in Tool table - CarID (Foreign Key) must exist in Car table.

Representation Analysis

While the design models the presented reality, I will now analyze its quality.

Completeness: All relevant entities of the problem and their relationships are represented, so we can say that the design is complete.

Correctness: As in Case 1, it adds attributes to the entities and relationships that are not in the requirements. The entire problem is represented in the schema, so in this case, we can also say it is semantically correct. Beyond the notation used and that the model is not represented through an ERM language, this model has integrity issues as it allows a "Mechanic" to repair a Car with Tools that are not used for that Car. For both reasons, we cannot say that the model is syntactically correct.

Expressiveness and Explicitness: As in Case 1 and for the same reasons, we can say that the model does not meet the properties of expressiveness and explicitness.

Minimality: It can be observed that in the model, the Mechanic relates to the car in two ways:
Repair: The cars they repair.
Uses: The tools they use to repair it.

In my opinion, these relationships are redundant and unnecessary, so we cannot say that the property of minimality is met. This problem can be solved using aggregation, where the "Repairs" relationship is a strong entity in the "Uses" relationship.

snomed-ct

Conclusion

The design phase is a crucial stage in software construction as the quality and success of a project are tied to good design. Artificial intelligence (AI) has introduced tools and techniques that have significantly improved efficiency, precision, and innovation in software development. It has become a fundamental ally for developers by facilitating the automation of tasks such as debugging and code generation, and improving team collaboration by automating documentation and providing intelligent interfaces that efficiently integrate and organize information. This allows developers to focus on more creative and strategic aspects.

In the early stages of software development, such as the design phase, while AI can be helpful, it should not be taken as an absolute or fundamental truth. Rather, it should be seen as a support tool or a second opinion from a colleague who may have a different perspective on the problem and therefore a different solution or idea, which helps in making strategic decisions to achieve an optimal solution.

France
phone

+33 7 88 89 33 89

phone

Chamonix, France 74400

UK
phone

+44 20 3286 2904

phone

2 Sandstone Quarry, 3 Carlton Road, Tunbridge Wells, TN1 2JS

Uruguay
phone

+59 897 286 917

phone

Ejido 1275, 11100, Montevideo, Uruguay

Let's Connect!
Copyright © 2024 by odev.tech