• KoboldCoterie@pawb.social
    link
    fedilink
    English
    arrow-up
    8
    ·
    edit-2
    1 年前

    While I agree with the sentiment, that’s 2-6 in 10,000,000 images; even if someone was personally reviewing all of the images that went into these data sets, which I strongly doubt, that’s a pretty easy mistake to make, when looking at that many images.

    • RecallMadness@lemmy.nz
      link
      fedilink
      English
      arrow-up
      8
      ·
      1 年前

      “Known CSAM” suggests researchers ran it through automated detection tools which the dataset authors could have used.

    • Sapphire Velvet@lemmynsfw.comOP
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      2
      ·
      1 年前

      They’re not looking at the images though. They’re scraping. And their own legal defenses rely on them not looking too carefully else they cede their position to the copyright holders.