Note that the answers assume you are collecting the data from an unrestricted source. In the US, a fact can not be copyrighted but a collection of facts can be, so a specific database may be copyrighted even if the data originally came from open sources, and extensive copying would make your version a Derivative Work at best.
It's not uncommon for databases that are copyrighted but exposed to include some "smoking gun" entries which don't affect use of the data but whose presence in another resource would demonstrate that it was bulk-copied from this one and hence (unless specifically authorized) a copyright violation.
Format of the data may also be copyrightable, or trademarkable, or qualify for a Design Patent.
Basically, if you aren't sure you have permission to use a dataset, ask. And then remember that they may be wrong. In either direction.