Search Results for author: Yuhao Chen

Found 66 papers, 7 papers with code

Ice Hockey Puck Localization Using Contextual Cues

no code implementations4 Jun 2025 Liam Salass, Jerrin Bright, Amir Nazemi, Yuhao Chen, John Zelek, David Clausi

For evaluation, in addition to standard average precision, we propose Rink Space Localization Error (RSLE), a scale-invariant homography-based metric for removing perspective bias from rink space evaluation.

Gen4D: Synthesizing Humans and Scenes in the Wild

no code implementations3 Jun 2025 Jerrin Bright, Zhibo Wang, Yuhao Chen, Sirisha Rambhatla, John Zelek, David Clausi

Lack of input data for in-the-wild activities often results in low performance across various computer vision tasks.

FoodTrack: Estimating Handheld Food Portions with Egocentric Video

no code implementations7 May 2025 Ervin Wang, Yuhao Chen

Accurately tracking food consumption is crucial for nutrition and health monitoring.

Gesture Recognition Nutrition

6D Pose Estimation on Spoons and Hands

no code implementations5 May 2025 Kevin Tan, Fan Yang, Yuhao Chen

Accurate dietary monitoring is essential for promoting healthier eating habits.

6D Pose Estimation Position +3

Dietary Intake Estimation via Continuous 3D Reconstruction of Food

no code implementations1 May 2025 Wallace Lee, Yuhao Chen

Monitoring dietary habits is crucial for preventing health risks associated with overeating and undereating, including obesity, diabetes, and cardiovascular diseases.

3D Reconstruction Pose Estimation

SAMJAM: Zero-Shot Video Scene Graph Generation for Egocentric Kitchen Videos

no code implementations10 Apr 2025 Joshua Li, Fernando Jose Pena Cantu, Emily Yu, Alexander Wong, Yuchen Cui, Yuhao Chen

Then, we employ a matching algorithm to map each object in the scene graph with a SAM2-generated or SAM2-propagated mask, producing a temporally-consistent scene graph in dynamic environments.

Graph Generation Object +2

Deep Learning for Time Series Forecasting: A Survey

no code implementations13 Mar 2025 Xiangjie Kong, Zhenghao Chen, Weiyao Liu, Kaili Ning, Lechao Zhang, Syauqie Muhammad Marier, Yichen Liu, Yuhao Chen, Feng Xia

However, existing surveys have not provided a unified summary of the wide range of model architectures in this field, nor have they given detailed summaries of works in feature extraction and datasets.

Deep Learning Survey +2

Asymmetric Decision-Making in Online Knowledge Distillation:Unifying Consensus and Divergence

no code implementations9 Mar 2025 Zhaowei Chen, Borui Zhao, Yuchen Ge, Yuhao Chen, RenJie Song, Jiajun Liang

Building on these findings, we propose Asymmetric Decision-Making (ADM) to enhance feature consensus learning for student models while continuously promoting feature diversity in teacher models.

Decision Making Knowledge Distillation +2

LEDiT: Your Length-Extrapolatable Diffusion Transformer without Positional Encoding

no code implementations6 Mar 2025 Shen Zhang, Yaning Tan, Siyuan Liang, Linze Li, Ge Wu, Yuhao Chen, Shuheng Li, Zhenyu Zhao, Caihua Chen, Jiajun Liang, Yao Tang

Diffusion transformers(DiTs) struggle to generate images at resolutions higher than their training resolutions.

Streamlining the Collaborative Chain of Models into A Single Forward Pass in Generation-Based Tasks

1 code implementation16 Feb 2025 Yuanjie Lyu, Chao Zhang, Yuhao Chen, Yong Chen, Tong Xu

In Retrieval-Augmented Generation (RAG) and agent-based frameworks, the "Chain of Models" approach is widely used, where multiple specialized models work sequentially on distinct sub-tasks.

RAG Retrieval-augmented Generation

Semantic Communication with Entropy-and-Channel-Adaptive Rate Control over Multi-User MIMO Fading Channels

no code implementations26 Jan 2025 Weixuan Chen, Qianqian Yang, Yuhao Chen, Chongwen Huang, Qian Wang, Zehui Xiong, Zhaoyang Zhang

Although significant improvements in transmission efficiency have been achieved, existing semantic communication (SemCom) methods typically use a fixed transmission rate for varying channel conditions and transmission contents, leading to performance degradation under harsh channel conditions.

Semantic Communication

FruitNinja: 3D Object Interior Texture Generation with Gaussian Splatting

no code implementations CVPR 2025 Fangyu Wu, Yuhao Chen

In the real world, objects reveal internal textures when sliced or cut, yet this behavior is not well-studied in 3D generation tasks today.

3D Generation 3DGS +1

Decoding Diffusion: A Scalable Framework for Unsupervised Analysis of Latent Space Biases and Representations Using Natural Language Prompts

no code implementations25 Oct 2024 E. Zhixuan Zeng, Yuhao Chen, Alexander Wong

This method allows for the automatic understanding of hidden features and supports a broader range of analysis without the need to train specific vectors.

Denoising Image Captioning +1

SplatPose+: Real-time Image-Based Pose-Agnostic 3D Anomaly Detection

1 code implementation15 Oct 2024 Yizhe Liu, Yan Song Hu, Yuhao Chen, John Zelek

Image-based Pose-Agnostic 3D Anomaly Detection is an important task that has emerged in industrial quality control.

3D Anomaly Detection 3DGS +1

MGSO: Monocular Real-time Photometric SLAM with Efficient 3D Gaussian Splatting

no code implementations19 Sep 2024 Yan Song Hu, Nicolas Abboud, Muhammad Qasim Ali, Adam Srebrnjak Yang, Imad Elhajj, Daniel Asmar, Yuhao Chen, John S. Zelek

As a result, experiments show that our system generates reconstructions with a balance of quality, memory efficiency, and speed that outperforms the state-of-the-art.

3DGS 3D Reconstruction

Towards Real-Time Gaussian Splatting: Accelerating 3DGS through Photometric SLAM

no code implementations7 Aug 2024 Yan Song Hu, Dayou Mao, Yuhao Chen, John Zelek

Initial applications of 3D Gaussian Splatting (3DGS) in Visual Simultaneous Localization and Mapping (VSLAM) demonstrate the generation of high-quality volumetric reconstructions from monocular video streams.

3DGS Simultaneous Localization and Mapping

Dense Monocular Motion Segmentation Using Optical Flow and Pseudo Depth Map: A Zero-Shot Approach

no code implementations27 Jun 2024 Yuxiang Huang, Yuhao Chen, John Zelek

In contrast, traditional methods based on optical flow do not require training data, however, they often fail to capture object-level information, leading to over-segmentation or under-segmentation.

Monocular Depth Estimation Motion Segmentation +2

Understanding the Limitations of Diffusion Concept Algebra Through Food

no code implementations5 Jun 2024 E. Zhixuan Zeng, Yuhao Chen, Alexander Wong

Image generation techniques, particularly latent diffusion models, have exploded in popularity in recent years.

Diversity Image Generation

Multi Player Tracking in Ice Hockey with Homographic Projections

no code implementations22 May 2024 Harish Prakash, Jia Cheng Shang, Ken M. Nsiempba, Yuhao Chen, David A. Clausi, John S. Zelek

Multi Object Tracking (MOT) in ice hockey pursues the combined task of localizing and associating players across a given sequence to maintain their identities.

Graph Matching Multi-Object Tracking

Region-level labels in ice charts can produce pixel-level segmentation for Sea Ice types

no code implementations16 May 2024 Muhammed Patel, Xinwei Chen, Linlin Xu, Yuhao Chen, K Andrea Scott, David A. Clausi

Fully supervised deep learning approaches have demonstrated impressive accuracy in sea ice classification, but their dependence on high-resolution labels presents a significant challenge due to the difficulty of obtaining such data.

Weakly-supervised Learning

PitcherNet: Powering the Moneyball Evolution in Baseball Video Analytics

no code implementations13 May 2024 Jerrin Bright, Bavesh Balaji, Yuhao Chen, David A Clausi, John S Zelek

In the high-stakes world of baseball, every nuance of a pitcher's mechanics holds the key to maximizing performance and minimizing runs.

Position

NutritionVerse-Direct: Exploring Deep Neural Networks for Multitask Nutrition Prediction from Food Images

no code implementations13 May 2024 Matthew Keller, Chi-en Amy Tai, Yuhao Chen, Pengcheng Xi, Alexander Wong

Many aging individuals encounter challenges in effectively tracking their dietary intake, exacerbating their susceptibility to nutrition-related health complications.

Nutrition

In The Wild Ellipse Parameter Estimation for Circular Dining Plates and Bowls

no code implementations12 May 2024 Akil Pathiranage, Chris Czarnecki, Yuhao Chen, Pengcheng Xi, Linlin Xu, Alexander Wong

Ellipse estimation is an important topic in food image processing because it can be leveraged to parameterize plates and bowls, which in turn can be used to estimate camera view angles and food portion sizes.

parameter estimation Semantic Segmentation

Zero-Shot Monocular Motion Segmentation in the Wild by Combining Deep Learning with Geometric Motion Model Fusion

no code implementations2 May 2024 Yuxiang Huang, Yuhao Chen, John Zelek

Detecting and segmenting moving objects from a moving monocular camera is challenging in the presence of unknown camera motion, diverse object motions and complex scene structures.

Motion Segmentation Segmentation

AnimateZoo: Zero-shot Video Generation of Cross-Species Animation via Subject Alignment

no code implementations7 Apr 2024 Yuanfeng Xu, Yuhao Chen, Zhongzhan Huang, Zijian He, Guangrun Wang, Philip Torr, Liang Lin

In this paper, we present AnimateZoo, a zero-shot diffusion-based video generator to address this challenging cross-species animation issue, aiming to accurately produce animal animations while preserving the background.

Video Editing Video Generation

Domain-Guided Masked Autoencoders for Unique Player Identification

no code implementations17 Mar 2024 Bavesh Balaji, Jerrin Bright, Sirisha Rambhatla, Yuhao Chen, Alexander Wong, John Zelek, David A Clausi

We further introduce a new spatio-temporal network leveraging our novel d-MAE for unique player identification.

Sports Analytics

Distribution and Depth-Aware Transformers for 3D Human Mesh Recovery

no code implementations14 Mar 2024 Jerrin Bright, Bavesh Balaji, Harish Prakash, Yuhao Chen, David A Clausi, John Zelek

Precise Human Mesh Recovery (HMR) with in-the-wild data is a formidable challenge and is often hindered by depth ambiguities and reduced precision.

Human Mesh Recovery

Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities

no code implementations22 Dec 2023 Yuhao Chen, Chloe Wong, Hanwen Yang, Juan Aguenza, Sai Bhujangari, Benthan Vu, Xun Lei, Amisha Prasad, Manny Fluss, Eric Phuong, Minghao Liu, Raja Kumar, Vanshika Vats, James Davis

This study critically evaluates the efficacy of prompting methods in enhancing the mathematical reasoning capability of large language models (LLMs).

Chatbot GSM8K +5

NutritionVerse-Synth: An Open Access Synthetically Generated 2D Food Scene Dataset for Dietary Intake Estimation

no code implementations11 Dec 2023 Saeejith Nair, Chi-en Amy Tai, Yuhao Chen, Alexander Wong

As the largest open-source synthetic food dataset, NV-Synth highlights the value of physics-based simulations for enabling scalable and controllable generation of diverse photorealistic meal images to overcome data limitations and drive advancements in automated dietary assessment using computer vision.

Diversity

FoodFusion: A Latent Diffusion Model for Realistic Food Image Generation

no code implementations6 Dec 2023 Olivia Markham, Yuhao Chen, Chi-en Amy Tai, Alexander Wong

To address these limitations, we introduce FoodFusion, a Latent Diffusion model engineered specifically for the faithful synthesis of realistic food images from textual descriptions.

Diversity Image Generation

Cancer-Net PCa-Gen: Synthesis of Realistic Prostate Diffusion Weighted Imaging Data via Anatomic-Conditional Controlled Latent Diffusion

no code implementations30 Nov 2023 Aditya Sridhar, Chi-en Amy Tai, Hayden Gunraj, Yuhao Chen, Alexander Wong

In Canada, prostate cancer is the most common form of cancer in men and accounted for 20% of new cancer cases for this demographic in 2022.

Prognosis

HiDiffusion: Unlocking Higher-Resolution Creativity and Efficiency in Pretrained Diffusion Models

no code implementations29 Nov 2023 Shen Zhang, Zhaowei Chen, Zhenyu Zhao, Yuhao Chen, Yao Tang, Jiajun Liang

Extensive experiments demonstrate that our approach can address object duplication and heavy computation issues, achieving state-of-the-art performance on higher-resolution image synthesis tasks.

Attribute Image Generation +1

Confidant: Customizing Transformer-based LLMs via Collaborative Edge Training

no code implementations22 Nov 2023 Yuhao Chen, Yuxuan Yan, Qianqian Yang, Yuanchao Shu, Shibo He, Jiming Chen

Transformer-based large language models (LLMs) have demonstrated impressive capabilities in a variety of natural language processing (NLP) tasks.

NutritionVerse-Real: An Open Access Manually Collected 2D Food Scene Dataset for Dietary Intake Estimation

no code implementations20 Nov 2023 Chi-en Amy Tai, Saeejith Nair, Olivia Markham, Matthew Keller, Yifan Wu, Yuhao Chen, Alexander Wong

Dietary intake estimation plays a crucial role in understanding the nutritional habits of individuals and populations, aiding in the prevention and management of diet-related health issues.

Diversity Management

AccEPT: An Acceleration Scheme for Speeding Up Edge Pipeline-parallel Training

no code implementations10 Nov 2023 Yuhao Chen, Yuxuan Yan, Qianqian Yang, Yuanchao Shu, Shibo He, Zhiguo Shi, Jiming Chen

Moreover, we propose a bit-level computation-efficient data compression scheme to compress the data to be transmitted between devices during training.

Data Compression

NAS-NeRF: Generative Neural Architecture Search for Neural Radiance Fields

no code implementations25 Sep 2023 Saeejith Nair, Yuhao Chen, Mohammad Javad Shafiee, Alexander Wong

Thus, there is a need to dynamically optimize the neural network component of NeRFs to achieve a balance between computational complexity and specific targets for synthesis quality.

NeRF Neural Architecture Search +2

NutritionVerse: Empirical Study of Various Dietary Intake Estimation Approaches

no code implementations14 Sep 2023 Chi-en Amy Tai, Matthew Keller, Saeejith Nair, Yuhao Chen, Yifan Wu, Olivia Markham, Krish Parmar, Pengcheng Xi, Heather Keller, Sharon Kirkpatrick, Alexander Wong

Recent work has focused on using computer vision and machine learning to automatically estimate dietary intake from food images, but the lack of comprehensive datasets with diverse viewpoints, modalities and food annotations hinders the accuracy and realism of such methods.

Jersey Number Recognition using Keyframe Identification from Low-Resolution Broadcast Videos

no code implementations12 Sep 2023 Bavesh Balaji, Jerrin Bright, Harish Prakash, Yuhao Chen, David A Clausi, John Zelek

To address these issues, we propose a robust keyframe identification module that extracts frames containing essential high-level information about the jersey number.

Jersey Number Recognition

Mitigating Motion Blur for Robust 3D Baseball Player Pose Modeling for Pitch Analysis

no code implementations2 Sep 2023 Jerrin Bright, Yuhao Chen, John Zelek

The findings highlight the effectiveness of our method in mitigating the challenges posed by motion blur, thereby enhancing the overall quality of pose estimation.

3D Pose Estimation Data Augmentation +1

The Model Inversion Eavesdropping Attack in Semantic Communication Systems

no code implementations8 Aug 2023 Yuhao Chen, Qianqian Yang, Zhiguo Shi, Jiming Chen

In recent years, semantic communication has been a popular research topic for its superiority in communication efficiency.

Semantic Communication

Transferring Knowledge for Food Image Segmentation using Transformers and Convolutions

no code implementations15 Jun 2023 Grant Sinha, Krish Parmar, Hilda Azimi, Amy Tai, Yuhao Chen, Alexander Wong, Pengcheng Xi

To address these issues, two models are trained and compared, one based on convolutional neural networks and the other on Bidirectional Encoder representation for Image Transformers (BEiT).

Image Segmentation Segmentation +1

Deep Joint Source-Channel Coding for Wireless Image Transmission with Entropy-Aware Adaptive Rate Control

no code implementations5 Jun 2023 Weixuan Chen, Yuhao Chen, Qianqian Yang, Chongwen Huang, Qian Wang, Zhaoyang Zhang

Adaptive rate control for deep joint source and channel coding (JSCC) is considered as an effective approach to transmit sufficient information in scenarios with limited communication resources.

Fast GraspNeXt: A Fast Self-Attention Neural Network Architecture for Multi-task Learning in Computer Vision Tasks for Robotic Grasping on the Edge

no code implementations21 Apr 2023 Alexander Wong, Yifan Wu, Saad Abbasi, Saeejith Nair, Yuhao Chen, Mohammad Javad Shafiee

As such, the design of highly efficient multi-task deep neural network architectures tailored for computer vision tasks for robotic grasping on the edge is highly desired for widespread adoption in manufacturing environments.

Multi-Task Learning Robotic Grasping

NutritionVerse-3D: A 3D Food Model Dataset for Nutritional Intake Estimation

no code implementations12 Apr 2023 Chi-en Amy Tai, Matthew Keller, Mattie Kerrigan, Yuhao Chen, Saeejith Nair, Pengcheng Xi, Alexander Wong

Unlike existing datasets, a collection of 3D models with nutritional information allow for view synthesis to create an infinite number of 2D images for any given viewpoint/camera angle along with the associated nutritional information.

Nutrition

NutritionVerse-Thin: An Optimized Strategy for Enabling Improved Rendering of 3D Thin Food Models

no code implementations12 Apr 2023 Chi-en Amy Tai, Jason Li, Sriram Kumar, Saeejith Nair, Yuhao Chen, Pengcheng Xi, Alexander Wong

With the growth in capabilities of generative models, there has been growing interest in using photo-realistic renders of common 3D food items to improve downstream tasks such as food printing, nutrition prediction, or management of food wastage.

Management NeRF +1

ShapeShift: Superquadric-based Object Pose Estimation for Robotic Grasping

no code implementations10 Apr 2023 E. Zhixuan Zeng, Yuhao Chen, Alexander Wong

To address these challenges, this paper proposes ShapeShift, a superquadric-based framework for object pose estimation that predicts the object's pose relative to a primitive shape which is fitted to the object.

Object Pose Estimation +1

MMRNet: Improving Reliability for Multimodal Object Detection and Segmentation for Bin Picking via Multimodal Redundancy

no code implementations19 Oct 2022 Yuhao Chen, Hayden Gunraj, E. Zhixuan Zeng, Robbie Meyer, Maximilian Gilles, Alexander Wong

We also demonstrate that our MC score is a more reliability indicator for outputs during inference time compared to the model generated confidence scores that are often over-confident.

Ensemble Learning object-detection +1

MetaGraspNet: A Large-Scale Benchmark Dataset for Scene-Aware Ambidextrous Bin Picking via Physics-based Metaverse Synthesis

no code implementations8 Aug 2022 Maximilian Gilles, Yuhao Chen, Tim Robin Winter, E. Zhixuan Zeng, Alexander Wong

Autonomous bin picking poses significant challenges to vision-driven robotic systems given the complexity of the problem, ranging from various sensor modalities, to highly entangled object layouts, to diverse item properties and gripper types.

Keypoint Detection Object +2

Demo: low-power communications based on RIS and AI for 6G

no code implementations21 May 2022 Mingyao Cui, Zidong Wu, Yuhao Chen, Shenheng Xu, Fan Yang, Linglong Dai

By jointly designing the hardware and software, this prototype can realize real-time 4K video transmission with much reduced power consumption.

4k

MetaGraspNet_v0: A Large-Scale Benchmark Dataset for Vision-driven Robotic Grasping via Physics-based Metaverse Synthesis

1 code implementation29 Dec 2021 Yuhao Chen, E. Zhixuan Zeng, Maximilian Gilles, Alexander Wong

We also propose a new layout-weighted performance metric alongside the dataset for evaluating object detection and segmentation performance in a manner that is more appropriate for robotic grasp applications compared to existing general-purpose performance metrics.

Object object-detection +3

FTPipeHD: A Fault-Tolerant Pipeline-Parallel Distributed Training Framework for Heterogeneous Edge Devices

no code implementations6 Oct 2021 Yuhao Chen, Qianqian Yang, Shibo He, Zhiguo Shi, Jiming Chen

Our numerical results demonstrate that FTPipeHD is 6. 8x faster in training than the state of the art method when the computing capacity of the best device is 10x greater than the worst one.

Low Resolution Information Also Matters: Learning Multi-Resolution Representations for Person Re-Identification

no code implementations26 May 2021 Guoqing Zhang, Yuhao Chen, Weisi Lin, Arun Chandran, Xuan Jing

As a prevailing task in video surveillance and forensics field, person re-identification (re-ID) aims to match person images captured from non-overlapped cameras.

Person Re-Identification Super-Resolution +1

TIPCB: A Simple but Effective Part-based Convolutional Baseline for Text-based Person Search

1 code implementation25 May 2021 Yuhao Chen, Guoqing Zhang, Yujiang Lu, zhenxing Wang, yuhui Zheng, Ruili Wang

Text-based person search is a sub-task in the field of image retrieval, which aims to retrieve target person images according to a given textual description.

Image Retrieval Person Search +3

Reference-Aided Part-Aligned Feature Disentangling for Video Person Re-Identification

no code implementations21 Mar 2021 Guoqing Zhang, Yuhao Chen, Yang Dai, yuhui Zheng, Yi Wu

Due to the inaccurate person detections and pose changes, pedestrian misalignment significantly increases the difficulty of feature extraction and matching.

Video-Based Person Re-Identification

Quantization in Relative Gradient Angle Domain For Building Polygon Estimation

no code implementations10 Jul 2020 Yuhao Chen, Yifan Wu, Linlin Xu, Alexander Wong

In this paper, we leverage the performance of CNNs, and propose a module that uses prior knowledge of building corners to create angular and concise building polygons from CNN segmentation outputs.

Quantization

A Voice Interactive Multilingual Student Support System using IBM Watson

no code implementations20 Dec 2019 Kennedy Ralston, Yuhao Chen, Haruna Isah, Farhana Zulkernine

The chatbot could also be adapted for use in other application areas such as student info-centers, government kiosks, and mental health support systems.

Chatbot

Locating Objects Without Bounding Boxes

6 code implementations CVPR 2019 Javier Ribera, David Güera, Yuhao Chen, Edward J. Delp

In these networks, the training procedure usually requires providing bounding boxes or the maximum number of expected objects.

Object Object Localization +1

Cannot find the paper you are looking for? You can Submit a new open access paper.