Implementing the Java file API on Top
of JDBC
Advanced Databases Systems
CPS 216, Fall
2001
Jing Zhang,
Sanjay Banerjee
jzhang@cs.duke.edu,
skb@ee.duke.edu
Advisor: Dr.
Jun Yang
Introduction
Motivation
The area of
file systems is a dynamic and interesting one.
As resources become more plentiful, the opportunities to improve file
systems increase. But before any
changes to a file system can be applied, the theory behind the proposed
modification should be tested. One way
to represent a file system is to use a database. By doing so, different modifications can be tested in the file
database before creating an actual file system implementation.
Our Project Goal
Our
goal is to implement a file system in a database using a variation of the
Java.io.file API to access it. The
target database is MySQL, which will be accessed by Java through the JDBC API.
A
simple comparison of the performance between the original Java.io.file API and
our version will be made. The most
challenging part is to find a way to represent a file in a database
system.
In simplest terms, a JDBC
technology-based driver ("JDBC driver") makes it possible to do three
things:
1. Establish a
connection with a data source
2. Send queries
and update statements to the data source
3. Process the
results
The
JDBC API supports both two-tier and three-tier models for database access.
In the two-tier model, a Java applet or
application talks directly to the data source. This requires a JDBC driver that
can communicate with the particular data source being accessed. This is
referred to as a client/server configuration, with the user's machine as the
client, and the machine housing the data source as the server.
Fig1: a two-tier architecture for data
access

In the three-tier model, commands are
sent to a "middle tier" of services, which then sends the commands to
the data source. The data source processes the commands and sends the results
back to the middle tier, which then sends them to the user.

Methodology
1.
Environment Setup
Hardware
·
2 Pentium III machines
·
OS: RedHat Linux 7.2 , kernel 2.4.3-12
Software:
·
MySQL Ver 11.13 Distrib 3.23.36, for redhat-linux-gnu (i386)
·
MySQL JDBC Driver Version 1.2 c
·
Java JDK 1.2
Our project required a computer with Java and a database management system
present. Here we choose MySQL as our primary database. MySQL is a relational
database management system. The reason we choose it is because it is open
Source Software and also it is very fast, reliable, and easy to use.
We
installed MySQL and the JDBC driver on two Red Hat Linux7 machines.
2.
File system Structure
We
setup a simple file structure, which can be easily seen from the following
chart. All of our files or subdirectories are under
“home/cps216/project/filedata” directory.
|------1.txt
| |--1.txt
|-----1stset-- |--2.txt
| |--sub--1.txt
home/cps216/project/filedata---|
|------2ndset--3.txt
|
| |--4.txt
|------3rdset--|--sub1-sub11-sub111-2.txt
3.
Database Design
Our
database consists of two tables. These
tables are shown below.
|
Field |
Type |
Null |
Key |
Default |
Extra |
FileName
|
varchar(20) |
|
PRI |
|
|
|
Path |
varchar(100) |
|
PRI
|
|
|
|
IsDirectory |
Char(1) |
|
|
|
|
|
ParentDirectory |
varchar(15) |
YES |
|
NULL |
|
|
FileID |
int(11) |
|
|
0
|
|
|
Size |
int(11) |
YES |
|
NULL |
|
|
Writable |
char(1) |
YES |
|
NULL |
|
|
Readable |
char(1) |
YES |
|
NULL |
|
|
LastModified |
Datetime |
|
|
0000-00-00 00:00:00 |
|
|
Field |
Type |
Null |
Key |
Default |
Extra
|
FileID
|
Int(11) |
|
PRI |
0 |
|
|
Offset |
Int(11) |
|
PRI
|
0 |
|
|
Contents |
varchar(254) |
YES |
|
NULL
|
|
The
first table is called FileMetaData. It is used for storing file metadata. This
table stores all the information about particular files but will not include
the actual file content. The primary
keys for this table are the file name and the path. Another key is the File ID.
Some other fields are file size and file permissions. For directory
control, we also assigned a FileID to each directory to identify. We have a
field named “IsDirectory” in this table. If it is “Y” that means this FileID
refers a directory. If it is “N”, just treat as a real single file.
The second
table contains the file contents stored as character arrays. The primary keys for this table are the File
ID and the offset. The offset can be
found using the size field from the FileMetadData table. Since the content is a
varchar, and in MySQL if you assign a value to a VARCHAR column that
exceeds the column's maximum length, which is 255, the value is truncated to
fit. Because of this reason, we need chop the file content off into smaller
pieces and store them in database as separate tuples.
4.
Program Design
Our
implementation mainly consists of two main programs. Fdui.java and Fdbase.java.
Fdui.java
is our interface function. Currently,
it supports individual file retrieval, listing of files and directories, and a
recursive delete. Fdbase.java
implements file retrieval and listing using calls to the MySQL database using
the JDBDC protocol.
5.
Timing Method
A
timing comparison was carried out between the Java.io.file List method and the
implemented listing function was carried out.
This test was carried out by creating identical file structures in the
actual file system and in the database.
The two listing functions were then used on each file structure. The graph below shows the results.

The
first iteration had a large penalty compared to future iterations. This may be due to memory accesses. Usually, the implemented file system
procedure was four to five times as long as the Java.io.file method. Every once in a while, as shown in iteration
7, the Java.io.file method took a larger amount of time to respond. This may be due to garbage collection. In order to verify this, more tests are
needed.
FUTURE
WORK
The
main purpose of this project was to create a platform for future work. In terms of testing the current system,
there are several possible variations to be tested. One way to test the performance is to use a deeper directory
structure and also increase the files for the system up to 100. To make this test feasible, a method of adding
real files to the database needs to be implemented. Another possible method addition would be adding concurrency
control. The methods used to access
the database file system can also be optimized for MySQL. Finally, a better user interface can be developed. Overall, there exists plenty of room for
future development and research.
References:
[1]
JDBC data access API, http://java.sun.com/products/jdbc/
[2]
MySQL Reference Manual for version 4.0.1-alpha
http://www.mysql.com/documentation/mysql/full/
[3]
Barry Nance, Examining The Network Performance Of JDBC
http://www.devx.com/upload/free/features/javapro/1999/03mar99/kn0399/kn0399.asp
[4]
Ken North, Java in the Database http://www.networkcomputing.com/810/810ws2.html
APPENDIX:
1.
fdui.java
2.
Fdbase.java
3.
metadatatable.txt
4.
filecontenttable.txt
5.
insertcontents.sql
6.
insertmetadata.sql
7.
structure.txt