aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Data
  • Engineering
  • Technology
  • Tools

Better Data For Better AI: New Speech Datasets And Benchmarks For Data

  • aster.cloud
  • December 25, 2021
  • 2 minute read

AI researchers and engineers need better data to enable better AI solutions. The quality of an AI solution is determined by both the learning algorithm (such as a deep-neural network model) and the datasets used to train and evaluate that algorithm. Historically, AI research has focused much more on algorithms than datasets, despite their vital importance. As a result, many algorithms are freely available as starting points, but many important problems lack large, high-quality open datasets. Further, creating new datasets is expensive and error-prone.

Recently, the data-centric AI movement has emerged, which aims to develop new methodologies and tools for constructing better datasets to fix this problem. Conferences, workshops,  challenges, and platforms are being launched to support improving data quality and to foster data excellence. Thought leaders such as Andrew Ng at Landing.AI and Chris Re at Stanford University are encouraging AI developers to focus more on iterative data engineering than they do tuning their learning algorithms. Our CHI-best-paper-award-winning paper, “Everyone wants to do the model work, not the data work” highlighted the significance of data quality in the practice of ML.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

At Google, we are excited to contribute to data-centric AI. Today, Google Cloud is adding a new high value dataset to the Public Dataset Program, and Google researchers are announcing DataPerf, a new multi-organizational effort to develop benchmarks for data quality and data centric algorithms.

Google Cloud is committed to helping users improve their data quality, starting with supporting better public data. The Public Datasets program provides high quality datasets pre-configured on GCP for easy access. Google Cloud is adding a new high-value dataset developed by the MLCommons™ Association (which Google co-founded) to the Public Datasets program: The Multilingual Spoken Words Corpus: a rich audio speech dataset with more than 340,000 keywords in 50 languages with upwards of 23.4 million examples.

Read More  Microsoft Build 2019 | Azure IoT Platform & Security Innovations: Foundation of a modern and secure IoT

This new public dataset is aligned with the MLCommons Association vision for “open” datasets – accessible by all – that are “living” – continually being improved to raise quality and increase representation and diversity.

Google researchers, in collaboration with multiple organizations, are announcing the DataPerf effort at the NeurIPS Data-Centric AI workshop today, to develop benchmarks to improve data quality. Much like the the MLPerf™ benchmarking effort which is now the industry standard for machine learning hardware/software speed, DataPerf brings together the originators of prior efforts including: CATS4ML, Data-Centric AI Competition, DCBench, Dynabench, and the MLPerf benchmarks to define clear metrics that catalyze rapid innovation. DataPerf will measure the utility of training and test data for common problems, and algorithms for working with datasets such as: selecting core sets, correcting errors, identifying under-optimized data slices, and valuing datasets prior to labeling.

Together, supporting open, living datasets for core ML tasks, and the development of benchmarks to direct the rapid evolution of those datasets will empower the researchers and engineers who use Google Cloud to do even more amazing things – and we can’t wait to see what they create!


Acknowledgements: In collaboration with Lora Aroyo and Praveen Paritosh.

 

 

By: Peter Mattson (Staff Engineer)
Source: Google Cloud Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • Artificial Intelligence
  • Benchmark
  • Google Cloud
You May Also Like
View Post
  • Engineering

Just make it scale: An Aurora DSQL story

  • May 29, 2025
View Post
  • Computing
  • Multi-Cloud
  • Technology

Reliance on US tech providers is making IT leaders skittish

  • May 28, 2025
View Post
  • Computing
  • Multi-Cloud
  • Technology

Examine the 4 types of edge computing, with examples

  • May 28, 2025
View Post
  • Computing
  • Multi-Cloud
  • Technology

AI and private cloud: 2 lessons from Dell Tech World 2025

  • May 28, 2025
View Post
  • Computing
  • Multi-Cloud
  • Technology

TD Synnex named as UK distributor for Cohesity

  • May 28, 2025
View Post
  • Computing
  • Multi-Cloud
  • Technology

Broadcom’s ‘harsh’ VMware contracts are costing customers up to 1,500% more

  • May 28, 2025
View Post
  • Computing
  • Multi-Cloud
  • Technology

Weigh these 6 enterprise advantages of storage as a service

  • May 28, 2025
View Post
  • Computing
  • Multi-Cloud
  • Technology

Pulsant targets partner diversity with new IaaS solution

  • May 23, 2025

Stay Connected!
LATEST
  • 1
    The Summer Adventures : Hiking and Nature Walks Essentials
    • June 2, 2025
  • 2
    Just make it scale: An Aurora DSQL story
    • May 29, 2025
  • 3
    Reliance on US tech providers is making IT leaders skittish
    • May 28, 2025
  • Examine the 4 types of edge computing, with examples
    • May 28, 2025
  • AI and private cloud: 2 lessons from Dell Tech World 2025
    • May 28, 2025
  • 6
    TD Synnex named as UK distributor for Cohesity
    • May 28, 2025
  • Weigh these 6 enterprise advantages of storage as a service
    • May 28, 2025
  • 8
    Broadcom’s ‘harsh’ VMware contracts are costing customers up to 1,500% more
    • May 28, 2025
  • 9
    Pulsant targets partner diversity with new IaaS solution
    • May 23, 2025
  • 10
    Growing AI workloads are causing hybrid cloud headaches
    • May 23, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • Understand how Windows Server 2025 PAYG licensing works
    • May 20, 2025
  • By the numbers: How upskilling fills the IT skills gap
    • May 21, 2025
  • 3
    Cloud adoption isn’t all it’s cut out to be as enterprises report growing dissatisfaction
    • May 15, 2025
  • 4
    Hybrid cloud is complicated – Red Hat’s new AI assistant wants to solve that
    • May 20, 2025
  • 5
    Google is getting serious on cloud sovereignty
    • May 22, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.