In contrast to the occupancy pruning used in Neural Radiance Fields, we demonstrate that the progressive densification of 3D Gaussians converges significantly faster for 3D generative tasks.
We also propose a mask-guided sparse video Transformer, which achieves high efficiency by discarding unnecessary and redundant tokens.
Ranked #1 on
Video Inpainting
on DAVIS
We believe that the main ingredient to the success of CLIP is its data and not the model architecture or pre-training objective.
In this stage, we increase the number of Gaussians by compactness-based densification to enhance continuity and improve fidelity.
To preserve the precision and detail of the line drawings, we propose a new approach, AnimeInbet, which geometrizes raster line drawings into graphs of endpoints and reframes the inbetweening task as a graph fusion problem with vertex repositioning.
Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans.
Radiance Field methods have recently revolutionized novel-view synthesis of scenes captured with multiple photos or videos.
We propose InternLM-XComposer, a vision-language large model that enables advanced image-text comprehension and composition.
At the core of this paradigm lies ChatDev, a virtual chat-powered software development company that mirrors the established waterfall model, meticulously dividing the development process into four distinct chronological stages: designing, coding, testing, and documenting.
LongLoRA adopts LLaMA2 7B from 4k context to 100k, or LLaMA2 70B to 32k on a single 8x A100 machine.