A-Team Insight Blogs

Bloomberg Offers Guidance on Getting Data Annotation Right for Machine Learning

Share article

Machine learning has become essential to financial institutions seeking timely business insight and signals of opportunity and risk across the business. At many firms, the technology is being scaled and use cases are proliferating. There are limitations, however, with useful outcomes from machine learning models depending on high quality data that is annotated accurately and consistently.

Data annotation probably isn’t the first thing that comes to mind when considering machine learning projects, but it is crucial to success and often difficult to achieve. With this in mind, Bloomberg has pulled together its expertise in annotation and published it for the use of other organisations.

The publication, Best Practices for Managing Data Annotation Projects, provides a practical guide to planning, executing, and evaluating the annotation step in machine learning projects. It was authored by Amanda Stent, natural language processing (NLP) architect in the office of the CTO; Tina Tseng, legal analyst with Bloomberg Law; and Domenic Maida, chief data officer, global data.

Key considerations of data annotation covered by the publication include, how to:

  • Identify stakeholders that should be involved in a project
  • Decide on datasets to be included in the project
  • Write and share annotation guidelines
  • Select an annotation tool
  • Test annotation for correct results and edge cases
  • Select the right team for each project based on the data
  • Ensure consistent communication across the team
  • Manage time and budget to ensure all project data is covered
  • Evaluate annotation quality at the end of the project.

The authors note that data annotation projects are ongoing processes rather than one-off tasks, and acknowledge the need for a human in the loop ‘as we have more contextual value than computers’.

Bloomberg’s expertise in annotation is built on the need to understand different types and formats of data that flow through its data pipelines and analytics, including earnings releases and tables, PDFs of filings, news articles, and ever-changing information about stocks, maturity dates of bonds, foreign exchange rates, and commodity prices. The company uses and contributes to the open source tool pybossa for data annotation.

Related content

WEBINAR

Upcoming Webinar: Managing unstructured data and extracting value

Date: 3 December 2020 Time: 10:00am ET / 3:00pm London / 4:00pm CET Duration: 50 minutes Unstructured data offers untapped potential but the platforms, tools and technologies to support it are nascent, often deployed for a specific problem with little reuse of common technologies from application to application. What are the challenges of managing and analysing...

BLOG

How a Great Private Bank Can Be Greater Still with the Right CRM

By Alessandro Tonchia, Co-Founder and Head of Strategy at Finantix The best relationship managers are proactive with their clients. Not only do they know the client history and have to hand everything a client asks for or currently requires, they also anticipate requests, and can even point their clients to interesting, relevant information that might...

EVENT

RegTech Summit Virtual

The highly successful RegTech Summit Virtual was held in November 2020 and explored how business and operating models are adapting post COVID and how RegTech can provide agile and enhanced compliance for managing an evolving risk and compliance landscape. The event featured daily live keynotes, panel discussions, presentations, fireside chats and Q&A sessions with content available on demand over five days.

GUIDE

Regulatory Data Handbook 2020/2021 – Eighth Edition

This eighth edition of A-Team Group’s Regulatory Data Handbook is a ‘must-have’ for capital markets participants during this period of unprecedented change. Available free of charge, it profiles every regulation that impacts capital markets data management practices giving you: A detailed overview of each regulation with key dates, data and data management implications, links to...