by
Patricia Hassett and Linda Roberge
Published October 15, 2002
Patricia Hassett is Professor of Law at Syracuse University. Formerly a prosecuting
attorney and a municipal government attorney, Professor Hassett served with the
Lord Chancellor's Advisory Committee on Legal Education and Conduct in England,
advising on the education and professional conduct of persons providing legal
services. She has also served as a consultant to the English Home Office on a
project to improve the quality of bail decisions. Professor Hassett writes in
the field of artificial intelligence and the law and has constructed a
prototype of an expert system that makes bail recommendations. She teaches
courses in constitutional criminal procedure, artificial intelligence and law,
and criminal law.
Linda Roberge is Assistant Research Professor at Syracuse University. Her Ph.D.
is in Information Systems with supporting fields of Statistics and Public
Administration. Her research interests include use of information
technology by the legal and medical professions.
Table of
Contents Introduction
|
Introduction
Lawyers,
along with other professionals, are looking at the many advantages that
information technology holds for their profession. In this paper we
discuss a new class of information that goes beyond databases to
the realm of data warehouses
and data mining. These state-of-the-art
technologies and the information they produce promise to redefine some of the
best-practice standards of the legal profession.
Transactional
Records Access Clearinghouse (TRAC), a
research center at Syracuse University, has developed a web-based data warehouse/data
mining application that makes it possible to produce useful information from
previously inaccessible data. For years, investigative reporters, public
interest groups, Congressional committees and others have used TRAC’s
application, TRACFed. Now
the legal profession has discovered the power of TRAC.
But why
should lawyers care about data warehouses and data mining? Some of the
many situations in which the information could prove useful are highlighted
using examples drawn from TRACFed.
More
About Trac’s Data Warehouse and Data Mining Tools
The TRACFed
data warehouse includes among its many offerings transactional data from the US
Federal government concerning its enforcement and prosecution activities,
staffing, federal expenditures, and more. Like many others, TRAC’s data
warehouse is extremely large currently, occupying approximately 300 gigabites
of storage space and growing monthly. Analyzing this much data to discover the
relationships of interest can be tricky, even for professional statisticians.
Lawyers and
legal researchers often are not trained to do even small data analyses. With
this audience in mind, TRAC developed a series of “point-and-click” style data
mining tools that put powerful analytical capabilities into the hands of
non-statisticians.
Why
Should Lawyers Care?
Successful
lawyering in a particular case involves knowledge and understanding of both the
relevant legal rules and the workings of the specific court system in which the
case will be processed. Lawyers who work regularly in a particular court
will rely on their experience with the workings of the legal system to make
decisions about how to handle a case and what advice to give to their
clients. Attorneys with limited or no previous contact in a particular
court may be at a disadvantage because they will not have the same
understanding of the system that their more experienced colleagues have.
However,
even for the well-honed veterans, experience can be misleading. Before
giving advice based upon a perception of how the system works, careful lawyers
would like to know whether their personal perceptions are consistent with
actual facts.
TRACFed
allows lawyers to confirm their impressions with actual data. Do cases
really move more slowly through Judge Smith’s court? How frequently does
a particular prosecutor decline certain types of cases? What is the
likelihood that my client’s tax return will be audited? How often do
criminal cases investigated by a particular agency result in a conviction?
With the
resources now available in TRACFed, the practicing lawyer needs
to be aware that the best practice standards of the profession are likely to
change. No longer will it be acceptable to rely on the impressions,
hunches, and anecdotes that have formed the basis for experiential knowledge to
date. We have already seen how information sources like Westlaw and
Lexis/Nexis have used technology to change best-practice standards regarding
the knowledge of legal rules. Now data warehouses of legal system
information (coupled with data mining tools) are likely to change the best
practice standards relating to knowledge of legal systems.
Examples
Case
1: You are bringing an
employment discrimination suit against the government. You feel you have
a really strong case but are concerned about a couple of things. First,
what is the typical amount of relief awarded? You decide to use TRACFed’s
Civil Layer with the Going Deeper Tool. (Please
see the Quick Start Guide
for more information.) With a few clicks of the mouse, you discover that
out of 1,446 employment litigation cases disposed of in 2001 in U.S. District
courts, only 327 were granted monetary relief. That’s less than
25%! And for those cases where relief was awarded and recorded, the
median amount (meaning half got more and half got less) was only $35,000.
You “drill down” by clicking on the link for Year 2001 and discover that in
your district, the percentage of cases where relief was awarded was even
lower. Again, you drill down to look at the individual records for your
district. Here you discover that the news isn’t all bad. The two
lowest awards were handled by an Assistant U.S. Attorney who has since left the
office!
Second, your
client is very discouraged at this point and wonders how long this whole thing
is going to drag on. You know that the judge assigned has been pulled out
of retirement, but you don’t know much else. Using the Express Tool on TRACFed's
Judges Layer, you discover that this particular judge has heard many
civil cases since his retirement, so you can probably find someone who has been
before him. You also discover that the average processing time for
the small number of cases he closed in 2001 where the government was the
defendant was 2,059 days!
Case 2: Your client went
through an IRS audit several years ago; it was an experience she doesn’t want
to repeat. Not only did she have to spend a considerable amount of time,
but she was also assessed additional taxes which created a considerable
burden. She wants to know what the chances are that she will be audited
this year.
You use TRACFed’s
Administrative Layer area to gather some information for your
client. Using the Express Tool you
discover that for her type of return (individual with income greater than
$100,000), less than 1% (0.38%) of the returns got audited in 2001. However,
this is a little over twice the audit rate for all returns. One piece of
good news is that the audit rate has been steadily declining over the past five
years. Another is that she has moved to a district where the audit rate
has been consistently lower than it was in her previous district.
But what is
likely to happen if she does get audited? Again TRACFed has
some answers. Clicking on …more on audits and focusing
on her income class, you find that nationally 80% of the audits result in a tax
change and the average amount of taxes and penalties in 2001 was $27,061.
In the
unlikely event that she is audited again this year, there is a lot more
information you could produce including installment agreements and
offers-in-compromise. All you need to do is to select a different Table
Topic from the menu of choices.
Case 3: Your client, an influential
doctor, has notified you that the FBI has begun an investigation of his
practice. Federal agents have been interviewing staff and are seeking a
court order to review practice records. He is particularly concerned that
he and his partners may not have strictly followed all the Medicare guidelines
regarding coding for surgeries. He is afraid that the government will
want to make an example of a high profile practice such as his. He wants
to know what has been going on lately in the area of health care fraud.
Although you
certainly have some impressions about what has been going on, you decide to
test your impressions with the TRACFed. Using the Analyzer Tool on the
Criminal Layer, you specify the data slice by choosing your
district and the program category of Health Care Fraud for the year 2001.
You discover that of the 182 referrals disposed of in your district in 2001,
125 were not prosecuted. Of the 57 that were prosecuted, over three
quarters (44) were convicted.
Wondering
what that meant in terms of prison time, you use the Explore and List
features and select Convictions as the stage to zero in on. You discover
that of those convicted, 28 received no prison time; the others received
sentences ranging from 2 to 30 months. This then is probably the worst case
scenario.
You probably
should look at fines too, but decide to think positively and look more closely
at the declinations. You generate a table that shows how many
prosecutions were declined for particular reasons. You find that the reason
given for over half the declinations was "Lack of evidence of criminal
intent."! You generate another table that separates these numbers by
investigating agency, and find that the FBI only does a little better than
other agencies at finding criminal intent. Raising the "lack of
criminal intent" issue seems like a potential strategy if the FBI decides
to refer your client's case to the U.S. Attorney.
One of the
things you noticed while looking at individual records was that all were marked
'N' for national priority. You wonder if priorities have changed since
9/11? You create data slice of records in 2002 and find that many are now
marked as both national and district priority. That certainly can't help!
You also
find two new pieces of information. 1) the only new cases to receive immediate
declinations due to weak evidence were investigated by the Secret Service
rather than by the FBI; and 2) the referrals coming from FBI seem to be going
to one of three prosecutors. One set of prosecutor initials you don’t
recognize so you want to look into the Staffing Layer to find the name
and seniority of this particular prosecutor. But that can wait for
another session.
There is
much more digging you would like to do. Luckily, the work you’ve done is
stored in your Web Locker so you can return to it tomorrow.
Summary
These three
cases provide readers with a small taste of the types of information they can
produce using TRACFed. Because the data mining tools are
easy to use, lawyers are able to generate sophisticated analyses to help them
plan more effective strategies than was previously possible. For
attorneys who practice in Federal District Courts, TRACFed is an
invaluable resource.
Transactional
Records Access Clearinghouse is a not-for-profit research center located at
Syracuse University. The center has been supported by the university and
by grants from Rockefeller Family Fund, the New York Times Company Foundation,
the John S. and James L. Knight Foundation, the Beldon Fund and the Open
Society Institute. User subscription fees from TRACFed help
to defray costs associated with updating and maintaining the data warehouse and
data mining tools.
For more
information see TRAC's public website at http://trac.syr.edu
and TRACFed at http://tracfed.syr.edu.
Notes
[1] Database - A collection of information. A
transactional database records and tracks the individual activities or
transactions of an organization. For example, when a government employee is
hired, information about the employee and his/her job is recorded in a
transactional database. As information about the employee changes (e.g.,
salary, work schedule, or grade) the database is updated. A transactional data
contains what is often referred to as “live” data that support the operations
of an organization. <back to text>
[2]
Data Warehouse - An integrated
compilation of data from various sources. Data warehouses differ from
transactional databases in several significant ways. First, warehouses consist
of one or more transactional databases that are integrated. Second, in addition
to transactional records, warehouses may contain summarized data that can also
be integrated with the transactional data. And finally, warehouses contain
historical data that is updated periodically, often quarterly or yearly, rather
than “live” data that is constantly being updated in real-time. Data warehouses
are constructed to facilitate decision-making and answer questions. <back to text>
[3] Data Mining - The process of searching for
trends, relationships, and patterns in large amounts of data often from a data
warehouse. Finding these hidden relationships is really the process of data
analysis. <back to text>
[4] Going Deeper Tool - The Going Deeper Tool
provides users a powerful yet easy to use "drill-down" capability. This
allows users to start with aggregated data and drill down all the way in to
individual case-by-case information on criminal and civil matters and to
individual employees. Using a point and click interface, users generate a
query that returns a series of linked tables, each of which relates to an increasingly
narrower subset of data. The the final drill shows the individual records that
make up the subset of data that was selected.
<back to text>
[5] Express Tool - The
Express Tool provides a simple means of quickly retrieving information from
TRAC data warehouses. Express enables users to examine and compare broad
categories of information, focus on particular geographic regions or topics,
generate rankings, make comparisons, and find trends. Using pull-down
menus, users can build dynamic queries based on the available options. Results
can be tailored through the query builder so that the desired information is
returned in alphabetical order, national ranking order, graphical or map
displays. Express is a great jumping off point for in-depth inquiries.
<back to text>
[6]
Analyzer Tool - Analyzer
Tool lets you, with a point and click interface, create your own data slice on
a selected subject (e.g. civil rights or the environment), a specific agency,
or a particular statute. The data slice is stored in your own individual Web
Locker where it is available for further mining and analysis using the Analyzer
Tool's power features. Analyzer has four power features. List enables you
to display individual records in their entirety – like the list that can be
generated in Going Deeper. Explore lets you examine the makeup of your data
slice along a number of user specified relevant categories. Focus allows you to
undertake the same close examination of your data slice using capabilities
similar to those found in the Express Tool. Rank enables the same ranking
analysis found in the Express Tool.
<back to text>
[7] Web Locker - The Web Locker is an online data
repository specific to an individual user. It stores the data slices and output
of the Power Analysis Features generated by users with the Analyzer Tool. <back to text>
Retirado
de: http://www.llrx.com/features/tracfed.htm