About a-team Marketing Services
The knowledge platform for the financial technology industry
The knowledge platform for the financial technology industry

A-Team Insight Blogs

Bloomberg Offers Guidance on Getting Data Annotation Right for Machine Learning

Subscribe to our newsletter

Machine learning has become essential to financial institutions seeking timely business insight and signals of opportunity and risk across the business. At many firms, the technology is being scaled and use cases are proliferating. There are limitations, however, with useful outcomes from machine learning models depending on high quality data that is annotated accurately and consistently.

Data annotation probably isn’t the first thing that comes to mind when considering machine learning projects, but it is crucial to success and often difficult to achieve. With this in mind, Bloomberg has pulled together its expertise in annotation and published it for the use of other organisations.

The publication, Best Practices for Managing Data Annotation Projects, provides a practical guide to planning, executing, and evaluating the annotation step in machine learning projects. It was authored by Amanda Stent, natural language processing (NLP) architect in the office of the CTO; Tina Tseng, legal analyst with Bloomberg Law; and Domenic Maida, chief data officer, global data.

Key considerations of data annotation covered by the publication include, how to:

  • Identify stakeholders that should be involved in a project
  • Decide on datasets to be included in the project
  • Write and share annotation guidelines
  • Select an annotation tool
  • Test annotation for correct results and edge cases
  • Select the right team for each project based on the data
  • Ensure consistent communication across the team
  • Manage time and budget to ensure all project data is covered
  • Evaluate annotation quality at the end of the project.

The authors note that data annotation projects are ongoing processes rather than one-off tasks, and acknowledge the need for a human in the loop ‘as we have more contextual value than computers’.

Bloomberg’s expertise in annotation is built on the need to understand different types and formats of data that flow through its data pipelines and analytics, including earnings releases and tables, PDFs of filings, news articles, and ever-changing information about stocks, maturity dates of bonds, foreign exchange rates, and commodity prices. The company uses and contributes to the open source tool pybossa for data annotation.

Subscribe to our newsletter

Related content

WEBINAR

Recorded Webinar: How to optimise SaaS data management solutions

Software-as-a-Service (SaaS) data management solutions go hand-in-hand with cloud technology, delivering not only SaaS benefits of agility, a reduced on-premise footprint and access to third-party expertise, but also the fast data delivery, productivity and efficiency gains provided by the cloud. This webinar will focus on the essentials of SaaS data management, including practical guidance on...

BLOG

Snowflake Cortex Simplifies Route to Deriving Value from Generative AI

Snowflake has unveiled Snowflake Cortex, an innovative managed service designed to simplify how organisations derive value from generative AI. The service provides access to large language models (LLMs), AI models, and vector search functionality in the Snowflake Data Cloud, and includes serverless functions that help users accelerate analytics and build contextualised LLM-powered apps within minutes,...

EVENT

AI in Capital Markets Summit London

The AI in Capital Markets Summit will explore current and emerging trends in AI, the potential of Generative AI and LLMs and how AI can be applied for efficiencies and business value across a number of use cases, in the front and back office of financial institutions. The agenda will explore the risks and challenges of adopting AI and the foundational technologies and data management capabilities that underpin successful deployment.

GUIDE

Regulatory Data Handbook 2019/2020 – Seventh Edition

Welcome to A-Team Group’s best read handbook, the Regulatory Data Handbook, which is now in its seventh edition and continues to grow in terms of the number of regulations covered, the detail of each regulation and the impact that all the rules and regulations will have on data and data management at your institution. This...