The problem of indexing dynamic collections of persistent objects in order to support efficient execution of similarity queries, is elaborated. Motivated by urgent needs of multimedia applications, non vector-based representations of object features are considered - existing index structures either do not apply to this case at all or do not guarantee stable performance in dynamic environments. Our solution leads to the definition of a new dynamic secondary-memory paged tree structure, called M-tree, which always remains balanced and can be applied whenever the distance function used to quantify object similarity is a metric. Basic principles of M-tree have been defined, and first prototype system implemented.