Search Results for author: David Dao

Found 11 papers, 8 papers with code

OAM-TCD: A globally diverse dataset of high-resolution tree cover maps

1 code implementation16 Jul 2024 Josh Veitch-Michaelis, Andrew Cottam, Daniella Schweizer, Eben N. Broadbent, David Dao, Ce Zhang, Angelica Almeyda Zambrano, Simeon Max

Using our dataset, we train reference instance and semantic segmentation models that compare favorably to existing state-of-the-art models.

Semantic Segmentation

GEO-Bench: Toward Foundation Models for Earth Monitoring

1 code implementation NeurIPS 2023 Alexandre Lacoste, Nils Lehmann, Pau Rodriguez, Evan David Sherwin, Hannah Kerner, Björn Lütjens, Jeremy Andrew Irvin, David Dao, Hamed Alemohammad, Alexandre Drouin, Mehmet Gunturkun, Gabriel Huang, David Vazquez, Dava Newman, Yoshua Bengio, Stefano Ermon, Xiao Xiang Zhu

Recent progress in self-supervision has shown that pre-training large neural networks on vast amounts of unsupervised data can lead to substantial increases in generalization to downstream tasks.

Data Debugging with Shapley Importance over End-to-End Machine Learning Pipelines

1 code implementation23 Apr 2022 Bojan Karlaš, David Dao, Matteo Interlandi, Bo Li, Sebastian Schelter, Wentao Wu, Ce Zhang

We present DataScope (ease. ml/datascope), the first system that efficiently computes Shapley values of training examples over an end-to-end ML pipeline, and illustrate its applications in data debugging for ML training.

BIG-bench Machine Learning Fairness

ReforesTree: A Dataset for Estimating Tropical Forest Carbon Stock with Deep Learning and Aerial Imagery

1 code implementation26 Jan 2022 Gyri Reiersen, David Dao, Björn Lütjens, Konstantin Klemmer, Kenza Amara, Attila Steinegger, Ce Zhang, Xiaoxiang Zhu

The potential for impact and scale of leveraging advancements in machine learning and remote sensing technologies is promising but needs to be of high quality in order to replace the current forest stock protocols for certifications.

Toward Foundation Models for Earth Monitoring: Proposal for a Climate Change Benchmark

no code implementations1 Dec 2021 Alexandre Lacoste, Evan David Sherwin, Hannah Kerner, Hamed Alemohammad, Björn Lütjens, Jeremy Irvin, David Dao, Alex Chang, Mehmet Gunturkun, Alexandre Drouin, Pau Rodriguez, David Vazquez

Recent progress in self-supervision shows that pre-training large neural networks on vast amounts of unsupervised data can lead to impressive increases in generalisation for downstream tasks.

Tackling the Overestimation of Forest Carbon with Deep Learning and Aerial Imagery

no code implementations23 Jul 2021 Gyri Reiersen, David Dao, Björn Lütjens, Konstantin Klemmer, Xiaoxiang Zhu, Ce Zhang

This proposal paper describes the first systematic comparison of forest carbon estimation from aerial imagery, satellite imagery, and ground-truth field measurements via deep learning-based algorithms for a tropical reforestation project.

TrueBranch: Metric Learning-based Verification of Forest Conservation Projects

no code implementations21 Apr 2020 Simona Santamaria, David Dao, Björn Lütjens, Ce Zhang

Recent works propose low-cost and accurate MRV via automatically determining forest carbon from drone imagery, collected by the landowners.

Metric Learning

Scalability vs. Utility: Do We Have to Sacrifice One for the Other in Data Importance Quantification?

1 code implementation CVPR 2021 Ruoxi Jia, Fan Wu, Xuehui Sun, Jiacen Xu, David Dao, Bhavya Kailkhura, Ce Zhang, Bo Li, Dawn Song

Quantifying the importance of each training point to a learning task is a fundamental problem in machine learning and the estimated importance scores have been leveraged to guide a range of data workflows such as data summarization and domain adaption.

Data Summarization Domain Adaptation

Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms

3 code implementations22 Aug 2019 Ruoxi Jia, David Dao, Boxin Wang, Frances Ann Hubis, Nezihe Merve Gurel, Bo Li, Ce Zhang, Costas J. Spanos, Dawn Song

The most surprising result is that for unweighted $K$NN classifiers and regressors, the Shapley value of all $N$ data points can be computed, exactly, in $O(N\log N)$ time -- an exponential improvement on computational complexity!

Data Valuation Fairness

Towards Efficient Data Valuation Based on the Shapley Value

1 code implementation27 Feb 2019 Ruoxi Jia, David Dao, Boxin Wang, Frances Ann Hubis, Nick Hynes, Nezihe Merve Gurel, Bo Li, Ce Zhang, Dawn Song, Costas Spanos

In this paper, we study the problem of data valuation by utilizing the Shapley value, a popular notion of value which originated in cooperative game theory.

Data Valuation

DataBright: Towards a Global Exchange for Decentralized Data Ownership and Trusted Computation

1 code implementation13 Feb 2018 David Dao, Dan Alistarh, Claudiu Musat, Ce Zhang

We illustrate that trusted computation can enable the creation of an AI market, where each data point has an exact value that should be paid to its creator.

BIG-bench Machine Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.