A Workshop on the Evaluation of Registration Algorithms on Standard Images

Summary

This is a description of a workshop held from 8:00 to 10:00pm, Wednesday, February 14, 1996 at SPIE's International Symposium, Medical Imaging 1996, in the Pacific Ballroom of the Newport Beach Marriott Hotel in Newport Beach, California. Approximately fifty researchers interested in medical image registration attended. The meeting was chaired by J. Michael Fitzpatrick of Vanderbilt University (click here for Fitzpatrick's homepage).

Purpose

The purpose of the Workshop was to promote the discussion of methods for the objective evaluation of algorithms for image registration. Many of those in attendance were already taking part in such an evaluation. That evaluation is sponsored by the National Institute of Neurological Disorders and Stroke (NINDS) and is entitled, “Evaluation of Retrospective Image Registration” (1 R01 NS33926-01, Fitzpatrick, principal investigator).

The NINDS Project

The NINDS project, which began in early 1995, takes advantage of Internet communication to distribute images from 10 patients from Vanderbilt University to each of several sites outside Vanderbilt (eleven groups at ten sites as of the time of this workshop). Investigators at each site apply registration techniques to pairs of images acquired from each of these patients to determine a rigid transformation to bring them into alignment. The transformations are then converted into a standard format developed for this project and are emailed by the investigators to Vanderbilt where they are compared with gold standards. The gold standard for each image pair is the transformation determined by a prospective system for registration developed at Vanderbilt for use in image-guided neurosurgical navigation and based on bone-implanted fiducial markers. Registration pairs include both CT/MR and PET/MR. The evaluation is called blinded because (1) the investigators outside Vanderbilt are not provided with the gold standard transformations until after their registrations are submitted and evaluated and (2) all traces of the fiducial markers are removed from all images before they are distributed. Statistics on the accuracies of the registration methods are calculated and tabulated. By the time of this workshop eleven groups at ten sites had been evaluated.

Structure of the discussion

The Workshop began with the chair thanking the co-chairs of the Image Processing track, Dr. Murray Loew of George Washington University and Dr. Ken Harrison of Los Alamos National Lab, Ms. Donna Rode, SPIE Technical Programs Coordinator, and Mr. Randy Cross, SPIE Conference Manager, for their assistance in setting up the session. He then suggested a format for the session to follow: Some questions relevant to this and future registration evaluation projects were posed, and it was suggested that the following discussion be aligned towards answering such questions. Before the open session began the chair gave a brief overview of the history of the evaluation project and defined some terms commonly used with regard to image registration and its evaluation. What follows is that history, those terms, and a list of comments and observations made during the open discussion organized according to the relevant questions.

History of Acustar and the NINDS project

While stereotactic frames have been in use in brain surgery for almost fifty years to relate points in an image of the brain to the physical head itself, only a decade has elapsed since the earliest efforts were made to bring two volume images of the same head into registration with each other. In the intervening years a variety of promising techniques have been described and tested, many of them involving CT-to-MR or PET-to-MR registration, and, unlike stereotaxy or other so-called “prospective” registration techniques, requiring no special patient preparation prior to imaging. These “retrospective” registration techniques are useful whenever information from the two modalities is needed in the same anatomical region.

In July, 1988, two years into this decade of retrospective registration, Johnson & Johnson, through their subsidiary, Codman and Shurtleff, Inc., funded at Vanderbilt a one-year project on prospective registration entitled, Three Dimensional Image Volume Registration and Reorientation Using Implantable Fiducial Markers, with Fitzpatrick and Robert J. Maciunas of the Department of Neurological Surgery as co-principal investigators. That project was renewed yearly through 1996 with Robert L. Galloway of the Department of Biomedical Engineering added early on as co-PI. The first paper on this project was presented in the Image Processing track at this Symposium four years ago [V. R. Mandava, J. M. Fitzpatrick, C. R. Maurer, R. J. Maciunas, and G. S. Allen, “Registration of multimodal volume head images via attached markers”, Proc. SPIE Medical Imaging VI, Vol. 1652, pp. 271-282 (February, 1992, Newport Beach, CA)]. The project itself led to the development of the system that was in 1993 named “Acustar”. Clinical trials of the Acustar system began in 1994. It was approved by the Food and Drug Administration for commercial use in December 1995.

In February 1993 Fitzpatrick and Maciunas attended the “Workshop on Computer-Assisted Surgery” in Washington, D.C., which was sponsored by the National Science Foundation and organized by Russell Taylor and George Bekey. At that meeting every one of the five working groups concluded independently that some form of image registration was a critical problem. After Maciunas presented preliminary results (four patients) on marker-based registration at Vanderbilt indicating a submillimetric level of accuracy, it was suggested during informal discussions that the growing set of registered images at Vanderbilt might serve as a standard for evaluating the accuracy of less invasive image registration methods. Later that year after receiving further encouragement from attendees and from others who had not attended the meeting, Maciunas and Fitzpatrick prepared a new patient consent form and obtained permission to go forward with a proposal to the NIH to allow investigators at remote sites to access the Vanderbilt database. In early 1994 Fitzpatrick sent letters to many prominent researchers in the image registration field inquiring as to their interest in taking part as investigators. To facilitate the communication of images and other data between Vanderbilt and remote sites, it was proposed to make use of the Internet. In May 1994 he submitted the proposal with Maciunas, Benoit Dawant of the Department of Electrical and Computer Engineering, Robert M. Kessler of the Department of Radiology and representatives from each of the external sites as co-investigators.

December 1, 1994, five days before the NIH review had been received the project began informally (and optimistically!) with the posting of a set of ``practice'' images on the Internet. Funding was in fact awarded in early 1995 with an official start date of March 20, 1995. Communication and formatting problems occupied much of the first few months of the project. By mid-summer few problems remained, and by early December most sites had provided a complete set of registrations. During the period from March to December 1995 three sites dropped out of the project, but by January 1996 eleven remaining sites had completed all registrations. After all registrations had been received at Vanderbilt, the registrations that had been submitted by each site were compared with the marker-based registrations and statistics on their differences were presented in a paper presented at this same symposium (earlier in the day of this workshop) [J. West, J. M. Fitzpatrick, M. Y. Wang, B. M. Dawant, C. R. Maurer, Jr., R. M. Kessler, R. J. Maciunas, et al., “Comparison and evaluation of retrospective intermodality image registration techniques”, Proc. SPIE Medical Imaging 1996, Vol. 2710, pp. 332--347 (Newport Beach, CA). Click here for the complete (Postscript) paper (available by permission of the SPIE). Click here for a text-only version (Postscript but without images). ].

Definitions of terms

1.      Blindedness: This term is used here to denote the ideal that all retrospective registrations be submitted without any knowledge of the gold standard transformations.

2.      Estimated TRE: the distance between the image of an anatomical point under the registration transform to be evaluated and its image under the gold standard transformation. Note that error in the gold standard increases the expected value of estimated TRE of each retrospective technique over its true TRE.

3.      Fiducials : devices implanted in or attached to the patient during scanning, designed to be easily visible in the final image. These are the basis of prospective registration methods.

4.      Image-to-image registration: the determination of a one-to-one mapping between images such that identical anatomical points are mapped together.

5.      Image-to-physical registration: the determination of a one-to-one mapping between an object (typically a patient in the operating room) and an image of that object and such that identical anatomical points are mapped together.

6.      Investigators: used herein to mean the participants in the registration evaluation project.

7.      Procrustes Method: named after a mythological Greek innkeeper who mutilated his guests in order that they might better fit his beds, this is a method of mapping one set of points onto another of the same size (i.e., same number of points) so that the mean square distance between corresponding points is minimized. The orthogonal procrustes method refers to the special case in which the mapping is required to be rigid. This latter method is used to effect the registration in Acustar and many other registration techniques that rely on the approximate alignment of corresponding points.

8.      Prospective registration : image-to-image or image-to-physical registration that requires preparation of the patient before imaging, typically the attachment of some sort of fiducial system, such as the stereotactic frame or fiducial markers, as an aid in determining the mapping. Prospective techniques are generally held to be more accurate than retrospective ones, but they also tend to be uncomfortable and/or invasive for the patient and are applicable only for images acquired after special preparation of the patient. The gold standard technique (i.e., Acustar) used in the NINDS evaluation project is prospective.

9.      Retrospective registration: image-to-image or image-to-physical registration that does not require prepration of the patient before imaging use fiducials, i.e. that will operate on standard images obtained without any special preparative steps. All the techniques tested in the evaluation project were retrospective.

10.  Rigid-body transformation, or rigid-body mapping : a mapping that preserves the distance between any pair of points, pre and post-transformation. For the case of neurological images, registration is usually assumed to be a rigid body problem. The retrospective mappings were evaluated in the NINDS project as rigid transformations.

11.  Stereotactic frame : a structure, usually cage-like, which is rigidly attached to and surrounds the patient's head. It is designed to be clearly visible in the scanning modalities used, providing many landmark positions for use by prospective registration techniques.

12.  Target Registration Error (TRE) : the distance between anatomically identical points after a registration has been performed.

13.  Volumes of Interest (VOIs) : locations at which the TRE was measured in the evaluation project in order to evaluate a registration technique.

The open discussion

This has been arranged so that comments on the same subject are grouped together, regardless of their chronological order within the Workshop.

Is blindedness necessary? Was it achieved?

1.      It was generally felt that blindedness was important in order for the presented results to be trustworthy and meaningful. One of the participants in the project who could not be present at the workshop had sent this comment to the chair by electronic mail: “I am much more confident about the validity of the results knowing that the study was blinded. With registration techniques often involving a certain degree of manual intervention, it is too easy with such a small data set to ‘tune’ the registration procedure based upon knowledge of the end result.”

A crucial step in this study, which occupied the Vanderbilt team early on, was the removal from the images of all traces of the fiducial markers, which were necessary for the determination of the gold standard transformations. There was general agreement that this step had been successful.

Serious concern was expressed on the other hand that some sites were allowed to reregister images after their “final” transformations had been submitted. These sites had uncovered errors in the conversion of their transformations to the standard transformation tables used in the study. The chair gave a detailed account of these exceptions. One in particular was described by the investigator involved. In this case rotation about one axis was consistently reversed in the conversion to the standard format. This claim was checked independently at Vanderbilt, found to be true, and corrected. It was requested that this process be described in more detail in the paper (referenced above). The chair agreed to add this detail. [The co-chairmen of the program committee for the Image Processing track of this symposium was approached after the workwhop. They agreed to allow a resubmission with this added detail. The details are present in the published paper.]

What other gold standards are there?

1.      One of the investigators referred to some recent work using cadavers with implanted tubes as a gold standard for evaluation. This study was also blind, as the tubes were removed from the images before application of the techniques being evaluated.

2.      One investigator suggested that stereotactic frames might provide such a standard. Another investigator voiced concern about registration using stereotactic frames: some evaluations of some of these systems' accuracy would suggest that they are unsuitable for use as a gold standard.

What makes a good set of standard images?

1.      It was pointed out that all the patient datasets used in the project were very similar. General consensus was that it would be an improvement if a wider range of image resolutions, acquisition parameters etc. were available.

2.      A suggestion was made that a collection of images of varied resolution could be made by acquiring high resolution images for each patient, and subsampling these to produce lower resolution data. It was also pointed out, however, that it would be difficult in practice to obtain a high resolution MR T2-weighted image.

3.      A concern was raised about the difficulty of presenting comprehensive statistics regarding registration results in the case when a large spectrum of image types and resolutions is used. The problem is that the total number of image registrations would be too large.

4.      The subject of evaluating the sensitivity of registration techniques to the particular acquisition protocol was mentioned. This led to a lengthy discussion of the ethics of “tuning” algorithms to work better on the particular type of data present in an evaluation study. (n.b., this concern is not to be confused with the concern raised above about tuning based on knowledge of the end result.) It was pointed out that several of the investigators had modified or developed techniques during the course of the project, and opinions were sharply divided over whether this should be permissible in a blind study.

Was patient confidentiality maintained?

1.      In this project all patient information was stripped from the image volumes before they were placed into the Internet database. Access to these images was then password-protected. It was generally agreed that patient confidentiality was maintained by this procedure.

How can user dependence and mistakes be reduced?

2.      The suggestion was made that, in future, the evaluating site might provide computer source code that would allow investigators to reformat an image volume according to the registration transform specified in their result submissions. Comparison of this reformatted image to that produced by the registration algorithm itself would be a step towards insuring that the correct transform was in fact specified by the submission. It was noted that there were some unexpectedly large Estimated TRE values calculated for some of the registration techniques in the project, and it was possible that these values were due to a problem in conversion to the evaluation format rather than to poor registration. This conversion step would not ordinarily be part of a clinical protocol.

3.      In order to reduce user dependence, the possibility was raised that the evaluating site do all registrations using source code provided by the investigators.

4.      It was argued that the in-house implementation of all registration by the evaluating site would destroy a vital advantage of the project, the parallelism produced by many sites performing registration tasks simultaneously.

What's next?

1.      The NINDS project is still underway at Vanderbilt. The chair announced that persons interested in learning of the state of the project and/or participating in the evaluation could contact him via email at jmf@vuse.vanderbilt.edu.

2.      One of the co-chairs of the Symposium's Image Processing track, Dr. Murray Loew, plans to organize a second workshop to be scheduled in conjunction with the 1997 Medical Imaging Symposium.

 

 Return to RIRE home page