I have moved my website hosting to github io starting Mar 2024.

A photo of Ahmed Taha

Ahmed Taha

PhD in Computer Vision/Machine Learning

Research Interest: Feature Embedding, Metric Learning, Deep Networks, Machine Learning, Image segmentation, Texture classification, Patch matching.
Technical Skills: Python, C/C++, JAVA, OpenCV, MATLAB, mex files, and CUDA
TensorFlow, PyTorch, Keras, OpenCV, SimpleITK, CAFFE
Education: Computer Science PhD (+MS) GPA: 4.0/4.0 - University of Maryland
Masters of Business Administration (Marketing Major) GPA 3.83/4.0 - Arab Academy for Science and Technology
Computer Science BS GPA 3.81/4.0 - Alexandria University- Faculty of Engineering


(click on image to expand)

We propose M&M, a Multi-view and Multi-instance learning system to localize malignant findings in mammograms. M&M focuses on clinical applicability and demonstrates its superiority by (1) surpassing previous works by a large margin in the clinically relevant region of less than 1 false-positive/image (left figure) (2) being resilient in clinical practice which includes abundant negative images (right figure). M&M: Tackling False Positives in Mammography with a Multi-view and Multi-instance Learning Sparse Detector, MICCAI 2023 (acceptance rate 32%)
Yen Nhi Truong Vu*, Dan Guo*, Ahmed Taha, Jason Su, Thomas Paul Matthews
Project Page   MICCAI Featured
Neural networks' building blocks. (Left) A standard convolutional block for vision models with spatial downsampling capability. (Center) A standard attention block for language models with long range attention capability. (Right) Our Attention-Convolutional (AC) block with both spatial downsampling and long range attention capabilities. In the AC block, the conv layer both reduces the spatial resolution  and increases the number of channels. Batchnorm and activation (e.g., RELU) layers are omitted for visualization purposes. Deep is a Luxury We Don't Have, MICCAI 2022 (acceptance rate 31%)
Ahmed Taha*, Yen Nhi Truong Vu*, Brent Mombourquette, Thomas Matthews, Jason Su, Sadanand Singh
Project Page
 We propose singular value maximization (SVMax) to learn a more uniform feature embedding.  The SVMax regularizer supports both supervised and unsupervised learning. Our formulation mitigates model collapse and enables larger learning rates. SVMax: A Feature Embedding Regularizer, arXiv 2021
Ahmed Taha, Alex Hanson, Abhinav Shrivastava, Larry Davis
Project Page
 we  propose  an evolution-inspired training approach to boost performance on relatively small datasets. The knowledge evolution approach splits a deep network into two hypotheses: the fit-hypothesis and the reset-hypothesis. We iteratively evolve the knowledge inside the fit-hypothesis by perturbing the reset-hypothesis for multiple generations. Knowledge Evolution in Neural Networks, CVPR Oral 2021 (acceptance rate 24% -- Oral 4%)
Ahmed Taha, Abhinav Shrivastava, Larry Davis
Project Page
 We formulate attention visualization as a constrained optimization problem. We leverage the unit L2-Norm constraint as an attention filter (L2-CAF) to localize attention in both classification and retrieval networks. A Generic Visualization Approach for Convolutional Neural Networks, ECCV 2020 (acceptance rate 27%)
Ahmed Taha, Xitong Yang, Abhinav Shrivastava, Larry Davis
Project Page
We extend standard architectures with an embedding head. We leverage a ranking regularizer to the embedding head to improve (1) classification accuracy and (2) feature embedding. Boosting Standard Classification Architectures Through a Ranking Regularizer, WACV 2020 (acceptance rate 34.5%)
Ahmed Taha, Yi-Ting Chen, Teruhisa Misu, Abhinav Shrivastava, Larry Davis
Project Page
We introduce an unsupervised formulation to estimate heteroscedastic uncertainty in retrieval systems. We propose an extension to triplet loss that models data uncertainty for each input. Unsupervised Data Uncertainty Learning in Visual Retrieval Systems, arXiv 2019
Ahmed Taha, Yi-Ting Chen, Teruhisa Misu, Abhinav Shrivastava, Larry Davis
arXiv Link
We cast visual retrieval as a regression problem by posing triplet loss as a regression loss. This enables epistemic uncertainty estimation using dropout as a Bayesian approximation framework in retrieval. Accordingly, Monte Carlo sampling is leveraged to boost retrieval performance Exploring Uncertainty in Conditional Multi-Modal Retrieval Systems, arXiv 2019
Ahmed Taha, Yi-Ting Chen, Teruhisa Misu, Larry Davis
arXiv Link
(a) An example of sequential residual network. (b) An example of directed acyclic graph. (c) The residual graph derived from (b). (d) The U-Net architecture. Downward arrows represent down-sampling or strided convolutions. Upward arrows represent up-sampling or deconvolution. (e) The Res. U-Net architecture. It consists of two threads. The first thread is scale-specific features: the green channels. It follows a similar architecture to the U-Net. The second thread is the residual architecture, including the red, the orange and the blue channels. Segmentation of Renal Structures for Image-Guided Surgery, MICCAI 2018 (acceptance rate 34.9%)
Junning Li, Pechin Lo, Ahmed Taha, Hang Wu, Tao Zhao
KID-Net architecture. The two contradicting phases are colored in blue. The down-sampling and up-sampling phases detect and localize features respectively. The segmentation result, at different scale levels, are averaged to compute the final segmentation. Kid-Net: Convolution Networks for Kidney Vessels Segmentation from CT-Volumes, MICCAI 2018 (acceptance rate 34.9%)
Ahmed Taha, Pechin Lo, Junning Li, Tao Zhao
Self-supervised learning framework formulated as a four-class classification problem. Given a tuple of a RGB frame and a stack of difference (SOD), the network reasons about frame ordering and spatio-temporal correspondence. Valid and invalid ordered motion are highlighted in green and red respectively. Class III shows a tuple of a weight lift- ing RGB frame and a SOD encoding a boxing action with a valid sequence – no spatio-temporal correspondence. Two Stream Self-Supervised Learning for Action Recognition CVPRW 2018
Ahmed Taha, Moustafa Meshry, Xitong Yang, Yi-Ting Chen, Larry Davis
(DeepVision)- Extended Abstract Github Code
Texture Synthesis with Recurrent Variational Auto-Encoder, ARXIV 2017
Rohan Chandra, Sachin Grover, Kyungjun Lee, Moustafa Meshry, Ahmed Taha
Github Code
Seeded Laplacian approach overview on Toy Image of size 48 × 48. (b)Toy image is perturbed with Gaussian noise and overlaid by seeded annotations (blue for background and yellow for fore- ground). (c)Setting a Zero threshold on the eigenfunction vector pro- duces (d) the final segmentation result. Seeded Laplacian: An Interactive Image Segmentation Approach using Eigenfunctions, ICIP 2015
Ahmed Taha, Marwan Torki
Github Code
The effect of discriminative embedding. Left: Image with provided user scribbles. Red for FG and blue for BG. Middle: 3D plot of the RGB channels for the provided scribbles. The scribbles are mixed in the RGB color space. Right: 3D plot of the first 3 dimensions of the our discriminative embedding. The color modalities present in the scribbles are preserved. Remark that the FG has two modalities namely skin color and jeans. Also, the BG has two modalities the sky and horse body. Multi-Modality Feature Transform: An Interactive Image Segmentation Approach, BMVC 2015
Moustafa Meshry, Ahmed Taha, Marwan Torki


Research Experience:

Selected Awards:

Teaching Experience:

(click to see course description)

After earning my Bsc, I spent some time developing mobile apps for iOS. I co-founded Inova, a software development company.