In recent years, Face Anti-Spoofing (FAS) has played a crucial role in preserving the security of face recognition technology.
We also perform a case study of a large codebase where PyGlove led to an 80% reduction in the number of lines of code.
We present PhysGen, a novel image-to-video generation method that converts a single image and an input condition (e. g., force and torque applied to an object in the image) to produce a realistic, physically plausible, and temporally consistent video.
Based on W2*A8 quantization configuration on LLaMA-7B model, it achieved a WikiText2 perplexity of 7. 59 (2. 17$\downarrow $ vs 9. 76 in AffineQuant).
For example, compared with the previous state-of-the-art~\cite{ISR}, CION with the same ResNet50-IBN achieves higher mAP of 93. 3\% and 74. 3\% on Market1501 and MSMT17, while only utilizing 8\% training samples.
In recent years, 3D hand pose estimation methods have garnered significant attention due to their extensive applications in human-computer interaction, virtual reality, and robotics.
Ranked #1 on 3D Hand Pose Estimation on HO-3D v2
Recent advancements in Large Language Models (LLMs) have expanded their capabilities to multimodal contexts, including comprehensive video understanding.
Due to the redundancy in LLM weights, recent research has focused on pushing weight-only quantization to extremely low-bit (even down to 2 bits).
Finally, we build an end-to-end framework on top of our abstraction to automatically optimize deep learning models for given tensor computation primitives.
Previous robot learning methods often collect data to train with one specific embodiment for one task, which is expensive and prone to overfitting.