From 254a06809129855a4f891a490381ae713f004ae2 Mon Sep 17 00:00:00 2001 From: Nulo Date: Wed, 27 Dec 2023 22:16:20 -0300 Subject: [PATCH] *** son warcs no tar --- readme.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/readme.md b/readme.md index f7138b4..59d4fb0 100644 --- a/readme.md +++ b/readme.md @@ -22,7 +22,7 @@ empezá descargando un WARC con 50 páginas de sample, y recomprimilo con zstd: ``` wget --no-verbose --tries=3 --delete-after --input-file ./data/samples/Dia.txt --warc-file=dia-sample -gzip -dc dia-sample.warc.gz | zstd --long -15 --no-sparse -o dia-sample.tar.zst +gzip -dc dia-sample.warc.gz | zstd --long -15 --no-sparse -o dia-sample.warc.zst ``` después, scrapealo a una BD: @@ -30,7 +30,7 @@ después, scrapealo a una BD: ``` cd scraper/ bun install -bun cli.ts scrap ../dia-sample.tar.zst +bun cli.ts scrap ../dia-sample.warc.zst ``` ahora miralo en el sitio: