2023-11-27 20:04:15 +00:00
|
|
|
|
# WIP: descargador masivo de datos públicos
|
|
|
|
|
|
|
|
|
|
require [Node.js](https://nodejs.org) y [pnpm](https://pnpm.io/)
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
pnpm install
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## correr
|
|
|
|
|
|
|
|
|
|
```
|
2023-11-29 00:00:19 +00:00
|
|
|
|
# descargar portal datos.gob.ar
|
|
|
|
|
pnpm run run https://datos.gob.ar/data.json
|
2023-11-29 00:48:17 +00:00
|
|
|
|
# guarda en data/datos.gob.ar_data.json
|
2023-11-29 00:00:19 +00:00
|
|
|
|
|
|
|
|
|
# descargar todos los portales conocidos
|
|
|
|
|
pnpm run run
|
|
|
|
|
# guarda en data/*
|
2023-11-27 20:04:15 +00:00
|
|
|
|
```
|
2023-11-28 03:44:47 +00:00
|
|
|
|
|
2023-11-28 21:38:40 +00:00
|
|
|
|
## contenedor
|
|
|
|
|
|
|
|
|
|
```
|
2023-11-28 22:58:17 +00:00
|
|
|
|
docker run --rm -it -v ./data:/data gitea.nulo.in/nulo/transicion-desordenada-diablo/downloader
|
2023-11-28 21:38:40 +00:00
|
|
|
|
# descarga datos.gob.ar
|
|
|
|
|
```
|
|
|
|
|
|
2023-11-28 03:44:47 +00:00
|
|
|
|
## formato de repo guardado
|
|
|
|
|
|
2023-11-29 00:19:23 +00:00
|
|
|
|
- `{url de data.json sin protocolo y con / reemplazado por _}/`
|
2023-11-28 03:44:47 +00:00
|
|
|
|
- `data.json`
|
2023-11-28 22:58:00 +00:00
|
|
|
|
- `errors.jsonl`: archivo con todos los errores que se obtuvieron al intentar descargar todo.
|
2023-11-29 00:04:44 +00:00
|
|
|
|
- `{identifier de dataset}/`
|
|
|
|
|
- `{identifier de distribution}/`
|
2023-11-28 03:44:47 +00:00
|
|
|
|
- `{fileName (o, si no existe, identifier de distribution)}`
|
2023-11-29 00:04:44 +00:00
|
|
|
|
|
|
|
|
|
### ejemplo
|
|
|
|
|
|
2023-11-29 00:19:23 +00:00
|
|
|
|
- `datos.gob.ar_data.json/`
|
2023-11-29 00:04:44 +00:00
|
|
|
|
- `data.json`
|
|
|
|
|
- `errors.jsonl`
|
|
|
|
|
- `turismo_fbc269ea-5f71-45b6-b70c-8eb38a03b8db/`
|
|
|
|
|
- `turismo_0774a0bb-71c2-44d7-9ea6-780e6bd06d50/`
|
|
|
|
|
- `cruceristas-por-puerto-residencia-desagregado-por-pais-mes.csv`
|
|
|
|
|
- ...
|
|
|
|
|
- `energia_0d4a18ee-9371-439a-8a94-4f53a9822664/`
|
|
|
|
|
- `energia_9f602b6e-2bef-4ac4-895d-f6ecd6bb1866/`
|
|
|
|
|
- `energia_9f602b6e-2bef-4ac4-895d-f6ecd6bb1866` (este archivo no tiene fileName en el data.json, entonces se reutiliza el `identifier`)
|
|
|
|
|
- ...
|