Implementing the Java file API on Top of JDBC

Advanced Databases Systems

CPS 216, Fall 2001

Jing Zhang, Sanjay Banerjee

jzhang@cs.duke.edu, skb@ee.duke.edu

Advisor: Dr. Jun Yang

 

 

 

 

 

 

Introduction

 

Motivation

 

The area of file systems is a dynamic and interesting one.  As resources become more plentiful, the opportunities to improve file systems increase.  But before any changes to a file system can be applied, the theory behind the proposed modification should be tested.  One way to represent a file system is to use a database.  By doing so, different modifications can be tested in the file database before creating an actual file system implementation.

 

Our Project Goal

 

Our goal is to implement a file system in a database using a variation of the Java.io.file API to access it.  The target database is MySQL, which will be accessed by Java through the JDBC API.

A simple comparison of the performance between the original Java.io.file API and our version will be made.  The most challenging part is to find a way to represent a file in a database system. 

 

Why the JDBC API?

 

In simplest terms, a JDBC technology-based driver ("JDBC driver") makes it possible to do three things:

1.       Establish a connection with a data source

2.       Send queries and update statements to the data source

3.       Process the results

The JDBC API supports both two-tier and three-tier models for database access.

In the two-tier model, a Java applet or application talks directly to the data source. This requires a JDBC driver that can communicate with the particular data source being accessed. This is referred to as a client/server configuration, with the user's machine as the client, and the machine housing the data source as the server.

 

 

Fig1: a two-tier architecture for data access

 

In the three-tier model, commands are sent to a "middle tier" of services, which then sends the commands to the data source. The data source processes the commands and sends the results back to the middle tier, which then sends them to the user.

Fig 2: a three-tier architecture for database access

 

Methodology

 

1.       Environment Setup

Hardware

·         2 Pentium III machines

·         OS: RedHat Linux 7.2 , kernel 2.4.3-12

           

Software:

·         MySQL Ver 11.13 Distrib 3.23.36, for redhat-linux-gnu (i386)

·         MySQL JDBC Driver Version 1.2 c

·         Java JDK 1.2

           
Our project required a computer with Java and a database management system present. Here we choose MySQL as our primary database. MySQL is a relational database management system. The reason we choose it is because it is open Source Software and also it is very fast, reliable, and easy to use.

 

We installed MySQL and the JDBC driver on two Red Hat Linux7 machines.  

 

2.       File system Structure

 

We setup a simple file structure, which can be easily seen from the following chart. All of our files or subdirectories are under “home/cps216/project/filedata” directory.

 

                                              

 

                 |------1.txt

                                              |                  |--1.txt

      |-----1stset-- |--2.txt

                                              |                  |--sub--1.txt

     home/cps216/project/filedata---|

                                               |------2ndset--3.txt

                                              |

                                              |                   |--4.txt

                                              |------3rdset--|--sub1-sub11-sub111-2.txt

 

 

3.       Database Design

 

Our database consists of two tables.  These tables are shown below. 

 

FileMeta Table

 Field

Type 

Null

Key

Default

Extra

FileName        

varchar(20)

 

PRI

 

 

 Path

varchar(100)

 

PRI

 

 

IsDirectory 

Char(1)     

 

 

 

 

ParentDirectory

varchar(15) 

YES

 

NULL

 

FileID         

int(11)     

 

 

0 

 

Size

int(11)     

YES

 

NULL

 

Writable       

char(1)     

YES

 

NULL

 

Readable

char(1)     

YES

 

NULL

 

LastModified   

Datetime 

 

 

0000-00-00 00:00:00

 

 

         

FileContents Table

 

Field

Type

Null

Key

Default

Extra
FileID  

Int(11)

 

PRI

0

 

Offset

Int(11)

 

PRI

0

 

Contents

varchar(254)

YES

 

NULL

 

 

 

The first table is called FileMetaData. It is used for storing file metadata. This table stores all the information about particular files but will not include the actual file content.  The primary keys for this table are the file name and the path.  Another key is the File ID.  Some other fields are file size and file permissions. For directory control, we also assigned a FileID to each directory to identify. We have a field named “IsDirectory” in this table. If it is “Y” that means this FileID refers a directory. If it is “N”, just treat as a real single file.

 

The second table contains the file contents stored as character arrays.  The primary keys for this table are the File ID and the offset.  The offset can be found using the size field from the FileMetadData table. Since the content is a varchar, and in MySQL if you assign a value to a VARCHAR column that exceeds the column's maximum length, which is 255, the value is truncated to fit. Because of this reason, we need chop the file content off into smaller pieces and store them in database as separate tuples.

 

 

4.       Program Design

 

Our implementation mainly consists of two main programs. Fdui.java and Fdbase.java.

Fdui.java is our interface function.  Currently, it supports individual file retrieval, listing of files and directories, and a recursive delete.  Fdbase.java implements file retrieval and listing using calls to the MySQL database using the JDBDC protocol. 

 

5.       Timing Method

 

A timing comparison was carried out between the Java.io.file List method and the implemented listing function was carried out.  This test was carried out by creating identical file structures in the actual file system and in the database.  The two listing functions were then used on each file structure.  The graph below shows the results.

The first iteration had a large penalty compared to future iterations.  This may be due to memory accesses.  Usually, the implemented file system procedure was four to five times as long as the Java.io.file method.  Every once in a while, as shown in iteration 7, the Java.io.file method took a larger amount of time to respond.  This may be due to garbage collection.  In order to verify this, more tests are needed.

 

FUTURE WORK

 

The main purpose of this project was to create a platform for future work.  In terms of testing the current system, there are several possible variations to be tested.  One way to test the performance is to use a deeper directory structure and also increase the files for the system up to 100.  To make this test feasible, a method of adding real files to the database needs to be implemented.  Another possible method addition would be adding concurrency control.   The methods used to access the database file system can also be optimized for MySQL.  Finally, a better user interface can be developed.  Overall, there exists plenty of room for future development and research.

 

 

References:

 

[1] JDBC data access API, http://java.sun.com/products/jdbc/

[2] MySQL Reference Manual for version 4.0.1-alpha

http://www.mysql.com/documentation/mysql/full/

[3] Barry Nance, Examining The Network Performance Of JDBC  http://www.devx.com/upload/free/features/javapro/1999/03mar99/kn0399/kn0399.asp

[4] Ken North, Java in the Database http://www.networkcomputing.com/810/810ws2.html

 

 

APPENDIX:

1.       fdui.java

2.       Fdbase.java

3.       metadatatable.txt

4.       filecontenttable.txt

5.       insertcontents.sql

6.       insertmetadata.sql

7.       structure.txt