X

IRS trudges on with aging computers

More than 20 years after its first attempt at a massive upgrade of 1960s-era mainframes, the IRS still isn't sure when the project will be finished.

Anne Broache Staff Writer, CNET News.com
Anne Broache
covers Capitol Hill goings-on and technology policy from Washington, D.C.
Anne Broache
8 min read
Tech follies

The Internal Revenue Service has been trying for years to upgrade its antiquated mainframe computers, which process Americans' tax returns by churning through millions of lines of assembly code written by hand in the early 1960s.

But after more than 20 years and over $5 billion, there's still no end in sight. Not all computer systems can talk to each other, information isn't available in real time, and tax returns filed on paper are often manually entered by typists.

An internal strategy document written seven years ago likened the upgrade task to redesigning and rebuilding a densely populated city like New York, without evacuating it first or disrupting the "daily pattern" of the residents' lives.

IRS' ailing computers

To run the numbers on tax returns, the IRS relies on computer systems from the Kennedy administration.

1960s
IRS designs and launches Master File system, still its primary repository of taxpayer information today.

1970s
The Integrated Data Retrieval System comes into existence.

1986
IRS receives 25,000 tax returns filed by modem in the first year it offers its e-File setup, begins its Tax Systems Modernization effort.

1997
After spending more than $3 billion, agency scraps TSM project and issues "blueprint" for renamed Business Systems Modernization project.

1998
Congress passes the Restructuring and Reform Act of 1998, setting goal for IRS to ave at least 80 percent of tax returns filed electronically by 2007.

2000
Then-IRS Commissioner Charles Rossotti releases his modernization vision, compares task ahead to rebuilding a large city.

2004
IRS launches Modernized e-File system, which allows businesses to file tax returns online. Releases first version of Customer Account Data Engine, the intended successor to Master File.

2006
IRS releases updated Modernization and Vision Strategy that prioritizes systems to replace Master File and IDRS.

2007
IRS rapped again for losing track of computer equipment and falling down on security.

The IRS' long-term goal is to run its operations with the efficiency Americans expect of banks and credit card companies, but it has consistently fallen short. Right now, for instance, a taxpayer who submits a tax return on a Monday will likely find that it will not be processed until at least the following weekend, thanks to limitations in the antiquated core of the agency's tax-processing apparatus. Over $3 billion was wasted in an earlier upgrade attempt in the 1990s. Last year, computer problems caused the IRS to erroneously hand out an estimated $318 million in fraudulent refunds.

Government audits show that the many years of planned upgrades have been dogged by the same missteps that plague so many massive government computer upgrades: inadequate management, ill-defined goals, repeated cost overruns, and failure to meet deadlines and expectations. (Earlier articles in this occasional CNET News.com series have explored computer systems at Homeland Security and the FBI.)

"They have made advances, and there has been incremental progress and success, but they still struggle," said Margaret Begg, an assistant inspector general for audit, told CNET News.com in an interview. Begg specializes in evaluating the U.S. Department of Treasury's tax-related computer systems.

The U.S. Government Accountability Office, which has long warned of the risks associated with this complex project, reached a similar conclusion in a February report to Congress. The auditors said the IRS had made improvements, but its future strategy remains worrisome because it lacks clear deadlines for "consolidating and retiring legacy systems."

Losing at least $318 million
IRS Chief Information Officer Richard Spires says there's reason to believe things are looking up. In the three years since he joined the agency, Spires said in an interview this week, the IRS has refined its management processes, hired more talented people, and wised up to the perils of taking on more than it can handle.

"There are no guarantees in this world in the sense of what could happen in the future, but I think the confidence level in us and the confidence level in our team has grown," he said. Spires, a former executive at software developer Mantas who has degrees in electrical engineering, became the IRS' assistant chief information officer in 2004 and was promoted to the CIO post in September.

One potentially major change is that, up until about two years ago, officials had been thinking the IRS needed to replace all the legacy systems--meaning about $8 billion over 15 years. The agency backed away from that idea in its latest modernization plan (PDF), released in October, saying it intended to include at least some components of the older systems in the transformation.

"I think it's the prudent thing to do given our management bandwidth and the dollars we're going to get," Spires said, adding that many of the older COBOL-based systems are still "very maintainable." Created in 1959, COBOL (Common Business-Oriented Language) is a mainstay of mainframe programming but has been criticized for its verbosity and for, in older versions, not supporting local variables. Local variables are fundamental to modern programming techniques.

Congress has written checks for significantly less money each year for the upgrades than some at the agency had originally expected, but the IRS has nonetheless spent $2.1 billion on its overall modernization efforts since the most recent phase began in 1999, according to Spires. (An earlier effort, which began in 1986 and was ultimately scrapped in 1997, cost more than $3 billion.)

Richard Spires
Richard Spires

And that's not counting the costly losses tied directly to botched upgrades. Last year, for example, Treasury Department auditors estimated that the IRS handed out at least $318.3 million in fraudulent refunds because a new Web-based fraud detection database, originally slated for use in 2005, was still not ready in time for last year's tax-filing season.

The gaffe occurred because the old system had already been dismantled months earlier amid assurances from the contractors involved--overseen by Computer Sciences Corporation--that the new system would arrive on schedule. The old version has now been restored for this year's tax season, Deputy Inspector General Michael Phillips told the U.S. Senate Finance Committee at a hearing this week.

Beyond sky-high costs, building such a massive system from scratch today is simply "too complicated," said Peter Neumann, a computer scientist with SRI International who served on an IRS advisory panel during the Clinton administration. "The fact that any high-school kid in the world can break into a system that's on the Internet right now and that the serious aggressors have relative ease in getting into systems that they shouldn't be getting into, means the problem is much more difficult now than it was then."

To be sure, the IRS has made progress on some technological fronts. The mere act of keeping an elderly system--written in assembly language--able to process tax returns when federal law changes every year is a significant ongoing task.

Electronic tax return filing has been another area that is enjoying relative success. The number of returns e-filed by individual taxpayers has been steadily climbing, and since 2004, corporations have had the option of filing their returns via the Internet. This year, about 21 million taxpayers have already logged onto a new "Where's my refund" Internet application this year to keep tabs on their expected checks.

Origins in the early days of computing
The year 1962 didn't just bring the first industrial robots to General Motors' vehicle assembly lines and the inaugural Earth-orbiting voyage by an American astronaut. It also marked the debut of Master File, an IRS mainframe system that even now remains its most important repository of taxpayer records.

The system, written in low-level assembly language, involves a series of very large magnetic tape files--one set dedicated to individual taxpayers, one for businesses, and others for holding data that doesn't fit into either category. (Assembly language relies on nearly inscrutable commands such as "A R5,0(R0,R8)," meaning add the word in memory pointed to by the value in location R8 to R5.)

"We don't really control our own fate here around budgets and the like. I think we've delivered an awful lot for what we've expended."
--Richard Spires, chief information officer, IRS

Those sequential files, stored primarily at a facility in Martinsburg, W.Va., are only capable of receiving batch updates on a weekly basis. Those delays mean that taxpayers might receive repeat notices for problems that have already been resolved, and IRS representatives don't always have the most up-to-date account information at their disposal.

It's also growing trickier to find programmers skilled in a language that has long dropped out of favor in computer science curriculums. The IRS has occasionally been forced to bring in outside trainers to ensure its employees can make even basic system updates arising from annual tax law changes.

Each week, certain records are extracted and placed on a separate system called the Integrated Data Retrieval System (IDRS), which allows IRS employees at a handful of service centers to query and to send updates to the files. The IRS has acknowledged that the system, which was designed in the 1970s and was last overhauled in 1985, takes excessive time to master in part because it requires entering hard-to-remember codes, rather than standard business English, to get anything done.

The IRS has since deployed more than 500 separate computer systems to handle facets of the process by which some 200 million tax returns are vetted every year. Many of them were designed throughout the years simply to provide workarounds or extract particular data from Master File and IDRS, which it considers the heart and soul of its operations.

Its highest priority project is a new system called the Customer Account Data Engine, or CADE, which is supposed to serve as the successor to Master File. CADE's data stores will be updated daily, thus allowing readier access to up-to-date account data and quicker processing of tax refunds.

But numerous delays and glitches have peppered CADE's development. Conceived in 1999, its first release, which can deal with only one kind of tax return, was slated for rollout in early 2002. It wasn't deployed until July 2004, however, in part because of "significant breakdowns" on the IRS' part in laying out what sort of requirements the system was expected to meet and testing to make sure it didn't fall short, the GAO said in a report last year. CADE is expected to process 17 million to 19 million returns this year--up from about 7.4 million last year--but delays in the latest software release mean the IRS will fall short of the 33 million returns it hoped the system would process by now.

Another planned system called Account Management Services, which will work with CADE, is supposed to tackle another obstacle: getting the disparate IRS systems to talk to each other. Right now, IRS telephone representatives often have to make callers wait as they manually toggle among different information stores, and employees in the agency's enforcement wing aren't able to run as sophisticated algorithms to determine who should and shouldn't receive an audit.

What's still unclear is when the IRS will be able to shelve its aging systems for good. Spires, the IRS' IT chief, maintains his department is being "aggressive," but he said it would be "inappropriate" to commit to hard deadlines for even the most critical projects without a better idea of how much funding lies ahead.

"We don't really control our own fate here around budgets and the like," he said. "I think we've delivered an awful lot for what we've expended."