By Group "Raw Text Connoisseurs"
Three steps of selection and screening when getting data
The common situation is that the company's documents have 10-k and 10-Q reports. At this time, we need to perform a screening. The problem in this step is that some companies did not disclose 10-k reports, but only disclosed 6-k and other documents, which led us to drop out all the data of this company.
Our ultimate goal is to get a panel data so we want to make as much as possible in the time frame we need, the selected company has data. At this time, we need to perform a second screening.
At the same time, because we want to cover more companies, we need to screen companies with large market capitalization and small market capitalization, which is the third screening.
All three screenings require a large number of non-normalized instructions. In addition, we did not find some publicly available APIs to crawl such data, so manual download is inevitable, which slows down the whole data acquisition process. We hope this step can be improved, and if you have a better approach to this, please share with us.