About this project

The financial news API is a realtime, continuously updated repository of financial and business related news articles and wires. Our robust API allows you to query for a search term or a ticker and retrieve comprehensive coverage of events pertaining to that query. On top of our API, our datasets provide great finetuning material for LLM's as well as good training material for general purpose NLP research and development.

Technicals

Articles are crawled and scraped from various sources and parsed and cleaned using selenium and puppeteer. A summarization model then scores each sentence in the article based on importance, extracts the most important keywords and builds a summary based on the ranking and importance of each sentence. Finally, a NER model extracts any known and relevant entity data before indexing the model.

The articles are then stored both persistently on a single server postgresql database as well as indexed on a 3 server elasticsearch cluster for fast, scalable retrieval. The search engine is built ontop of elasticsearch, utilizing it's DSL language to query and cache the article results in redis.

Datasets

Datasets are updated at the end of every week.

Roadmap

Some planned improvements to this project: