Public Dataset Archive 25
Natural LanguageA Free Public Dataset
High quality verifiable public dataset #25 tailored for machine learning pre-training loops.
LicenseMIT
FormatParquet
Dataset Size79 GB
📢 Ad Space — Configure AdSense to display ads here