Difference between revisions of "Artificial Intelligence in Journalism"

From SI410
Jump to: navigation, search
(How Algorithms Are Made)
(How AI Journalists Are Made)
Line 10: Line 10:
 
All AI algorithms require access to training data in order to learn, and develop the procedures used to accomplish their task<ref>https://stagezero.ai/blog/what-is-training-data/#:~:text=Artificial%20Intelligence%20(AI)%20and%20machine,the%20full%20potential%20of%20AI.</ref>. In the case of AI journalism and content generation, algorithms are trained on other preexisting content, such as human written pieces, large sets of data, or even things like audio's and images. Since these algorithms generate content based on the data they were trained to learn from, they often pick up characteristics from that data. This is an important step, as picking up characteristics from the training data is important for the algorithms to be able to perform specialized tasks efficiently and accurately. However, immense volumes of data are needed to train advanced AI models, especially when the task, like content generation, is very complex to accomplish<ref>https://graphite-note.com/how-much-data-is-needed-for-machine-learning#:~:text=Generally%20speaking%2C%20the%20rule%20of,100%20rows%20for%20optimal%20results.</ref>.
 
All AI algorithms require access to training data in order to learn, and develop the procedures used to accomplish their task<ref>https://stagezero.ai/blog/what-is-training-data/#:~:text=Artificial%20Intelligence%20(AI)%20and%20machine,the%20full%20potential%20of%20AI.</ref>. In the case of AI journalism and content generation, algorithms are trained on other preexisting content, such as human written pieces, large sets of data, or even things like audio's and images. Since these algorithms generate content based on the data they were trained to learn from, they often pick up characteristics from that data. This is an important step, as picking up characteristics from the training data is important for the algorithms to be able to perform specialized tasks efficiently and accurately. However, immense volumes of data are needed to train advanced AI models, especially when the task, like content generation, is very complex to accomplish<ref>https://graphite-note.com/how-much-data-is-needed-for-machine-learning#:~:text=Generally%20speaking%2C%20the%20rule%20of,100%20rows%20for%20optimal%20results.</ref>.
  
NLG has evolved a lot over the last few decades<ref>https://www.marketingaiinstitute.com/blog/the-beginners-guide-to-using-natural-language-generation-to-scale-content-marketing</ref>. Machines are now good at generating analyses and textual summaries of datasets. The application of NLG in more complex forms of writing, such as creative writing, or commentaries is much less developed given the complex nature of these tasks. However, as computational power increases and becomes cheaper, the capabilities of AI systems to tackle increasingly complex data sets grows, enabling more powerful models that are able to generate higher quality outputs.<ref>https://www.encora.com/insights/the-increase-in-computer-power-is-driving-applied-ai#:~:text=We've%20seen%20an%20increase,data%20into%20their%20training%20processes.</ref>
+
NLG has evolved a lot over the last few decades<ref>https://www.marketingaiinstitute.com/blog/the-beginners-guide-to-using-natural-language-generation-to-scale-content-marketing</ref>. Machines are now good at generating analyses and textual summaries of datasets. The application of NLG in more complex forms of writing, such as creative writing, or commentaries is much newer and far less developed. However, the field is consistently growing, and many are invested in creating AI with greater writing autonomy. As computational power increases and becomes cheaper, the capabilities of AI systems to tackle increasingly complex data sets grows, enabling more powerful models that are able to generate higher quality outputs and more closely mimic human writing.<ref>https://www.encora.com/insights/the-increase-in-computer-power-is-driving-applied-ai#:~:text=We've%20seen%20an%20increase,data%20into%20their%20training%20processes.</ref>
  
 
==Notable Robot Journalists==
 
==Notable Robot Journalists==

Revision as of 01:12, 11 February 2023

Artificial-intelligence journalism, or AI journalism, is the use of artificial intelligence software and other algorithmic processes to assist in or fully automate aspects of the journalistic process. The AI algorithms used in this are generally categorized through a few broad categories: data collection, content generation, information verification, and news dissemination.[1] The programs that fall into these general categories each play important roles in the story production process, and the combination of their individual output is what amounts to articles that require limited human interaction to generate. The behavior and capabilities of any automated software depends on the developers approach to creating the program, and this ultimately dictates the quality of the article that AI journalism is capable of creating. While there are no guidelines for the way in which these software function, typically, they follow a similar process. It involves an algorithm that is capable of scanning large amounts of provided data, identifying and piecing together key points gathered from the data, and combining this information to generate a human-readable piece. Aspects of writing like tone, expressiveness, and style are customizable, and largely dependent on the data which the AI algorithm was trained on.

The idea of automated news has existed for nearly a half century, with its earliest implementations beginning with numerical and statistical reports for topics like weather, and financial news.[2] The automation of more advanced types of journalism and reporting is a relatively newer field, and very much still in its infancy. However, the rapid growth of AI technologies over the last decade due to increased computing power, reduced costs, and greater investment has enabled bolder and more powerful algorithms to enter the space, allowing more complex forms of writing to be emulated by software.[3]

The role AI will play in journalism's future is hard to tell.[4] AI journalism holds the potential to alleviate menial burdens from humans, increasing efficiency, reducing costs, and giving news agencies more time to delve into increasingly complex and pressing topics. At the same time, this technology raises concerns over content quality, and has the ability to threaten the role of humans journalists in the field.[5] The influence of media is extremely large and impactful in our societies, communities, and lives, making the ethical concerns of implementing automation and technology a heavily covered topic.

How AI Journalists Are Made

Artificial intelligence journalists come in a variety of forms, and serve a range of purposes. Some are used to label images, categorize data, and perform other tasks that assist their human counterparts, while more complex models might generate complete articles on their own. AI journalism, no matter the models specialization, relies on Natural Language Generation (NLG), which is a software process that uses artificial intelligence to generate human understandable output. NLG has broad use cases, with creative article generation being a highly sought after goal[6]. AI journalists are developed using fundamental NLG ideas, but the greater the level of automation that an algorithm has, the more complex it's development process is.

All AI algorithms require access to training data in order to learn, and develop the procedures used to accomplish their task[7]. In the case of AI journalism and content generation, algorithms are trained on other preexisting content, such as human written pieces, large sets of data, or even things like audio's and images. Since these algorithms generate content based on the data they were trained to learn from, they often pick up characteristics from that data. This is an important step, as picking up characteristics from the training data is important for the algorithms to be able to perform specialized tasks efficiently and accurately. However, immense volumes of data are needed to train advanced AI models, especially when the task, like content generation, is very complex to accomplish[8].

NLG has evolved a lot over the last few decades[9]. Machines are now good at generating analyses and textual summaries of datasets. The application of NLG in more complex forms of writing, such as creative writing, or commentaries is much newer and far less developed. However, the field is consistently growing, and many are invested in creating AI with greater writing autonomy. As computational power increases and becomes cheaper, the capabilities of AI systems to tackle increasingly complex data sets grows, enabling more powerful models that are able to generate higher quality outputs and more closely mimic human writing.[10]

Notable Robot Journalists

Heliograf

Heliograf is the Washington Post's homegrown AI journalist that was initially conceived to cover the 2017 Rio Olympics[11]. Since then, Heliograf has been repurposed for a variety of situations, ranging from the coverage of local sports games, to congressional races and events. In 2017, the bot published nearly 850 articles.

StatSheet

StatSheet was an online sports content network publishing platform, which was fully automated. The publications provided detailed statistics on various American sports, and did so through automatic collection processes.

CNET

CNET is an American media website that publishes a variety of content ranging from articles, reviews, videos, and podcasts. Since the site's inception in 1994, the media provider has primarily focused on technology topics. In November 2022, CNET stepped into the field of AI journalism, and began employing automation technology in the process of generating articles. Many of the published articles were primarily written by artificial intelligence.[12]. CNET did not disclose that it was using machines to write articles, instead opting to publish the pieces under the title, CNET Money Staff, leaving readers unaware that they were reading AI-generated content. This generated a lot of criticism for CNET, especially when it was found that AI-generated articles were riddled with inaccuracies, calculation errors, and misleading information that would normally be vetted in the editorial process[13]. CNET faced additional criticism, when it was found that its automated content generation algorithms were plagiarizing content from human written pieces with citing them[14]. CNET's exploration into the field of AI journalism has been riddled with challenges regarding accuracy and plagiarism. Critics of AI technology in journalistic fields have cited CNET's rocky start as evidence against further investment into the field. Proponents of the technology, however, see this example as an opportunity, and believe that embracing new ideas and addressing these issues in the early stages, will help create a better future in journalism.

Applications

Cost Reduction

Proponents of AI journalism cite the drastically reduced cost associated with handing more of the work over to AI. Over the last two decades, the news industry has been somewhat stagnant and showing signs of decline in profitability and growth. [15] With decreasing budgets, it's becoming harder for local news agencies to stay afloat, which has been show to damage the structure of communities.[16] The advent of AI in journalism would supposedly addresses this issue, as automation leads to reduced lower costs. Cutting costs can help news agencies handle greater pressure by eliminating many human costs, while still adhering to quality.

Speed

The use of powerful algorithms in journalism would enable news to be spread much faster. The moment the software receives data, it can begin to formulate an article surrounding the topic, capturing the important information that is relevant to readers. This eliminates barriers or time sinks associated with human production of content. Important information can be generated and outputted within minutes after an event occurs. Increasing the speed and flow of information benefits readers ability to understand and remain updated about various situations.

Automating manual and menial journalistic processes enables journalists to dedicate more time to producing higher quality content. Software is able to analyze and sort through large quantities of information at rates exponentially faster than their human counterparts.[17] Removing the need of human intervention for tasks like calculations and other low level analyses, allows journalists to dedicate time to the more complex aspects of journalism. With the assistance of software, human journalists would be able to develop a much more in depth analysis of their data.[18] Pairing journalistic skills with the quality of data extrapolated by computers, would enable journalists to create much more meaningful conclusions and insights from their work. Thus, many supporters of AI journalism don't believe that AI will kill the journalism field and role of human interaction. Instead, they believe AI will be the future of the field, and the next step for journalists[19]. Proponents argue that AI-based tools will never replace human journalists, but will instead shift their responsibilities and empower their ability to produce more in depth content[20]

Ethical Concerns

Algorithmic Bias

As explained earlier, designing AI systems is a rigorous process, and requires strong control over each step of the process. Programmers design the underlying algorithms, select and process the data to train the algorithms on, and determine how to apply the results of the algorithm to create the AI system. This finely tuned process, is what enables the objective and data driven decisions that AI can make. However, when not carefully supervised, it can allow bias to enter the system. Bias can emerge from many unanticipated factors, such as the design of the algorithm or the decisions relating to the way data was coded, and collected to train the algorithm. [21] In computer science, there's a principle know as garbage in, garbage out; the concept that flawed input produces flawed output. When this principle is violated when training an algorithm, biases from the training data can unwittingly be baked into the algorithms decision making process. In the scope of AI journalism, this means that content generation algorithms can become programmed to take biased views on topics, if they were trained on data that was biased itself. [22]

Apart from poor data, the creator of the algorithm themselves can have a large impact on the biases an algorithm adopts. Without diverse teams and rigorous testing, it is easy for an individual or a small group of developers to let subtle, unconscious biases enter their models, which AI then automates and perpetuates at large scales.[23].

When algorithms are created improperly and biases are present, while they may be unintended, they can have serious consequences. AI journalism, just like normal journalism, has a broad scope, as society as a whole takes in the media. As a result, poorly created AI journalists, can create potential ramifications at a societal level by introducing media bias. Media bias is a powerful force, and when spread widely enough, it can lead to inequitable outcomes for societies most vulnerable groups. [24] Because of how certain races and ethnic groups were treated in the past, data can often contain hidden biases that influences the decision making capabilities of algorithms. Media bias is generally negative, as biased information can lead to negative social outcomes such as group polarization, intolerance of dissent, and reduced efficiency in collective decision making [25].

The efficiency of algorithmic content generation provides the tools for bias to spread rapidly and far if AI models are created irresponsibly and without the appropriate checks in place. As a result, it's important for algorithms used for media generation to be trained on the right data, to prevent the concern of bias emerging from news.

Future of Employment and Authorship

Another concern opponents of AI journalism argue, is the potential impact on human employment in the field. Over the last decade, the field of journalism and number of professional editors in the field has already experienced a large decline. [26] Introducing increased levels of automation into the field is feared to exacerbate this issue and remove more individuals from these positions. As companies seek to minimize costs, the transition to AI journalists presents an effective way to reduce costs. Positions like data collection, analysis, and low level fact checking, where computers can perform better than humans, would experience shifts in the workforce makeup.

Furthermore, it is argued that AI journalism also presents a challenge surrounding authorship. The introduction of automation makes it difficult to assign a creator to the piece. Journalism is often considered an art form, and automation takes away from that. [27] When a computer generates a piece of content, it is challenging to identify who should be listed as its creator. It's not trivial to identify whether the developer of the algorithm, the humans working alongside the software, or the authors of the pieces on which the AI system was trained on should be credited. This issue removes transparency from the journalistic process, and creates difficulty in identifying the parties responsible for a piece.

References

  1. https://www.mdpi.com/2673-5172/2/2/14
  2. https://towcenter.gitbooks.io/guide-to-automated-journalism/content/status_quo_of/the_state_of.html
  3. https://www.wired.com/insights/2015/03/ai-resurgence-now/
  4. https://www.washingtonpost.com/business/energy/why-the-future-of-technology-is-so-hard-to-predict/2022/12/28/57fd3ac2-86b0-11ed-b5ac-411280b122ef_story.html
  5. https://www.tandfonline.com/doi/abs/10.1080/21670811.2016.1209083?journalCode=rdij20
  6. https://www.huffpost.com/entry/philip-parker-books_n_2648820
  7. https://stagezero.ai/blog/what-is-training-data/#:~:text=Artificial%20Intelligence%20(AI)%20and%20machine,the%20full%20potential%20of%20AI.
  8. https://graphite-note.com/how-much-data-is-needed-for-machine-learning#:~:text=Generally%20speaking%2C%20the%20rule%20of,100%20rows%20for%20optimal%20results.
  9. https://www.marketingaiinstitute.com/blog/the-beginners-guide-to-using-natural-language-generation-to-scale-content-marketing
  10. https://www.encora.com/insights/the-increase-in-computer-power-is-driving-applied-ai#:~:text=We've%20seen%20an%20increase,data%20into%20their%20training%20processes.
  11. https://digiday.com/media/washington-posts-robot-reporter-published-500-articles-last-year/
  12. https://futurism.com/the-byte/cnet-publishing-articles-by-ai
  13. https://www.engadget.com/cnet-reviewing-ai-written-articles-serious-errors-113041405.html
  14. https://futurism.com/cnet-ai-plagiarism
  15. https://www.pewresearch.org/fact-tank/2022/10/13/after-increasing-in-2020-layoffs-at-large-u-s-newspapers-and-digital-news-sites-declined-in-2021/
  16. https://www.cbsnews.com/news/local-news-financial-firms-60-minutes-2022-06-12/
  17. https://www.tandfonline.com/doi/abs/10.1080/10714421.2015.1031996?journalCode=gcrv20
  18. https://thenextweb.com/news/face-it-ai-is-better-at-data-analysis-than-humans
  19. https://www.niemanlab.org/2022/12/ai-enters-the-newsroom/
  20. https://www.niemanlab.org/2022/12/ai-enters-the-newsroom/
  21. https://hbr.org/2020/11/a-simple-tactic-that-could-help-reduce-bias-in ai#:~:text=credit%2C%20and%20more.-,It's%20been%20well%2Destablished%20that%20AI%2Ddriven%20systems%20are%20subject,by%20experts%20with%20implicit%20biases.
  22. https://www.pwc.com/us/en/tech-effect/ai-analytics/algorithmic-bias-and-trust-in-ai.html#:~:text=The%20short%20answer%3A%20People%20write,AI%20then%20automates%20and%20perpetuates.
  23. https://www.pwc.com/us/en/tech-effect/ai-analytics/algorithmic-bias-and-trust-in-ai.html#:~:text=The%20short%20answer%3A%20People%20write,AI%20then%20automates%20and%20perpetuates.
  24. https://unitedwaysem.org/equity_challenge/day-4-understanding-our-bias-the-consequences-of-bias/#:~:text=Bias%20can%20be%20dangerous%20and,influence%20actions%20that%20are%20discriminatory.
  25. Dominic Spohr. 2017. Fake news and ideological polarization: Filter bubbles and selective exposure on social media. Business Information Review 34, 3 (2017), 150–160.
  26. https://www.poynter.org/reporting-editing/2015/newspaper-industry-lost-3800-full-time-editorial-professionals-in-2014/
  27. https://www.fourstatesliving.com/feature-stories/2018/1/31/journalism-is-an-art-form#:~:text=The%20art%20of%20storytelling%20is,discovered%20early%20on%20in%20life.