The leading knowledge platform for the financial technology industry
The leading knowledge platform for the financial technology industry

A-Team Insight Blogs

Bloomberg Offers Guidance on Getting Data Annotation Right for Machine Learning

Machine learning has become essential to financial institutions seeking timely business insight and signals of opportunity and risk across the business. At many firms, the technology is being scaled and use cases are proliferating. There are limitations, however, with useful outcomes from machine learning models depending on high quality data that is annotated accurately and consistently.

Data annotation probably isn’t the first thing that comes to mind when considering machine learning projects, but it is crucial to success and often difficult to achieve. With this in mind, Bloomberg has pulled together its expertise in annotation and published it for the use of other organisations.

The publication, Best Practices for Managing Data Annotation Projects, provides a practical guide to planning, executing, and evaluating the annotation step in machine learning projects. It was authored by Amanda Stent, natural language processing (NLP) architect in the office of the CTO; Tina Tseng, legal analyst with Bloomberg Law; and Domenic Maida, chief data officer, global data.

Key considerations of data annotation covered by the publication include, how to:

  • Identify stakeholders that should be involved in a project
  • Decide on datasets to be included in the project
  • Write and share annotation guidelines
  • Select an annotation tool
  • Test annotation for correct results and edge cases
  • Select the right team for each project based on the data
  • Ensure consistent communication across the team
  • Manage time and budget to ensure all project data is covered
  • Evaluate annotation quality at the end of the project.

The authors note that data annotation projects are ongoing processes rather than one-off tasks, and acknowledge the need for a human in the loop ‘as we have more contextual value than computers’.

Bloomberg’s expertise in annotation is built on the need to understand different types and formats of data that flow through its data pipelines and analytics, including earnings releases and tables, PDFs of filings, news articles, and ever-changing information about stocks, maturity dates of bonds, foreign exchange rates, and commodity prices. The company uses and contributes to the open source tool pybossa for data annotation.

Related content

WEBINAR

Recorded Webinar: Getting ready for Sustainable Finance Disclosure Regulation (SFDR) and ESG – what action should asset managers be taking now?

Interest in Environmental, Social and Governance (ESG) investment has exploded in recent years, bringing with it regulation and a requirement for buy-side firms to develop ESG strategies and meet disclosure obligations. The sell-side can help here by integrating ESG data with traditional financial information, although the compliance burden remains with asset managers. The EU Sustainable...

BLOG

A-Team Group Releases Entity Data Management Handbook 2021

The use of accurate, complete and timely entity data and modern entity data management solutions can be the difference between regulatory compliance and penalties for non-compliance, doing business with the right or wrong customers, and complying with sanctions and Politically Exposed Persons (PEPs) requirements or breaching these and incurring eye-watering financial fines and reputational damage....

EVENT

Data Management Summit Virtual

The Data Management Summit Virtual will bring together the global data management community to share lessons learned, best practice guidance and latest innovations to emerge from the recent crisis. Join us online to hear from leading data practitioners and innovators from the UK, US and Europe who will share insights into how they are pushing the boundaries with data to deliver value with flexible but resilient data driven strategies.

GUIDE

Entity Data Management Handbook – Seventh Edition

Sourcing entity data and ensuring efficient and effective entity data management is a challenge for many financial institutions as volumes of data rise, more regulations require entity data in reporting, and the fight again financial crime is escalated by bad actors using increasingly sophisticated techniques to attack processes and systems. That said, based on best...