Artificial Intelligence in Journalism

From SI410
Jump to: navigation, search

Artificial-intelligence journalism, or AI journalism, is the use of artificial intelligence software and other algorithmic processes to assist in or fully automate aspects of the journalistic process. The AI algorithms used in this are generally categorized through a few broad categories: data collection, content generation, information verification, and news dissemination.[1] The programs that fall into these general categories each play important roles in the story production process, and the combination of their individual output is what amounts to articles that require limited human interaction to generate. The behavior and capabilities of any automated software depends on the developers approach to creating the program, and this ultimately dictates the quality of the article that AI journalism is capable of creating. While there are no guidelines for the way in which these software function, typically, they follow a similar process. It involves an algorithm that is capable of scanning large amounts of provided data, identifying and piecing together key points gathered from the data, and combining this information to generate a human-readable piece. Aspects of writing like tone, expressiveness, and style are customizable, and largely dependent on the data which the AI algorithm was trained on.

The idea of automated news has existed for nearly a half century, with its earliest implementations beginning with numerical and statistical reports for topics like weather, and financial news.[2] The automation of more advanced types of journalism and reporting is a relatively newer field, and very much still in its infancy. However, the rapid growth of AI technologies over the last decade due to increased computing power, reduced costs, and greater investment has enabled bolder and more powerful algorithms to enter the space, allowing more complex forms of writing to be emulated by software.[3] The role AI will play in journalism's future is hard to tell.[4] AI journalism holds the potential to alleviate menial burdens from humans, increasing efficiency, reducing costs, and giving news agencies more time to delve into increasingly complex and pressing topics. At the same time, this technology raises concerns over content quality, and has the ability to threaten the role of humans journalists in the field.[5] The influence of media is extremely large and impactful in our societies, communities, and lives, making the ethical concerns of implementing automation and technology a heavily covered topic.

The History of Natural Language Generation and Automated Journalism 2007-2018 [6]

How AI Journalists Are Made

Artificial intelligence journalists come in a variety of forms, and serve a range of purposes. Some are used to label images, categorize data, and perform other tasks that assist their human counterparts, while more complex models might generate complete articles on their own. AI journalism, no matter the models specialization, relies on Natural Language Generation (NLG), which is a software process that uses artificial intelligence to generate human understandable output. NLG has broad use cases, with creative article generation being a highly sought after goal[7]. AI journalists are developed using fundamental NLG ideas, but the greater the level of automation that an algorithm has, the more complex it's development process is.

All AI algorithms require access to training data in order to learn, and develop the procedures used to accomplish their task[8]. In the case of AI journalism and content generation, algorithms are trained on other preexisting content, such as human written pieces, large sets of data, or even things like audio's and images. Since these algorithms generate content based on the data they were trained to learn from, they often pick up characteristics from that data. This is an important step, as picking up characteristics from the training data is important for the algorithms to be able to perform specialized tasks efficiently and accurately. However, immense volumes of data are needed to train advanced AI models, especially when the task, like content generation, is very complex to accomplish[9].

NLG has evolved a lot over the last few decades[10]. Machines are now good at generating analyses and textual summaries of datasets. The application of NLG in more complex forms of writing, such as creative writing, or commentaries is much newer and far less developed. However, the field is consistently growing, and many are invested in creating AI with greater writing autonomy. As computational power increases and becomes cheaper, the capabilities of AI systems to tackle increasingly complex data sets grows, enabling more powerful models that are able to generate higher quality outputs and more closely mimic human writing.[11]

Notable AI Journalists

Heliograf

Heliograf is the Washington Post's homegrown AI journalist that was initially conceived to cover the 2017 Rio Olympics[12]. Since then, Heliograf has been repurposed for a variety of situations, ranging from the coverage of local sports games, to congressional races and events. In 2017, the bot published nearly 850 articles.

StatSheet

One of the earliest implementation of content automation, StatSheet was an online sports content network publishing platform, which was fully automated. The publications provided detailed statistics on various American sports, and did so through automatic collection processes.

CNET

CNET is an American media website that publishes a variety of content ranging from articles, reviews, videos, and podcasts. Since the site's inception in 1994, the media provider has primarily focused on technology topics. In November 2022, CNET stepped into the field of AI journalism, and began employing automation technology in the process of generating articles. Many of the published articles were primarily written by artificial intelligence.[13]. CNET did not disclose that it was using machines to write articles, instead opting to publish the pieces under the title, CNET Money Staff, leaving readers unaware that they were reading AI-generated content. This generated a lot of criticism for CNET, especially when it was found that AI-generated articles were riddled with inaccuracies, calculation errors, and misleading information that would normally be vetted in the editorial process[14]. CNET faced additional criticism, when it was found that its automated content generation algorithms were plagiarizing content from human written pieces with citing them[15]. CNET's exploration into the field of AI journalism has been riddled with challenges regarding accuracy and plagiarism. Critics of AI technology in journalistic fields have cited CNET's rocky start as evidence against further investment into the field. Proponents of the technology, however, see CNET's challenges as an opportunity, and believe that embracing new ideas and addressing these issues in the early stages, will help create a better future in journalism.

Applications and Potential Benefits

Cost Reduction

Proponents of AI journalism cite the drastically reduced cost associated with handing more of the work over to AI. Over the last two decades, the news industry has been somewhat stagnant and showing signs of decline in profitability and growth. [16] With decreasing budgets, it's becoming harder for local news agencies to stay afloat, which has been show to damage the structure of communities.[17] The advent of AI in journalism would supposedly addresses this issue, as automation leads to reduced lower costs. Cutting costs can help news agencies handle greater pressure by eliminating many human costs, while still adhering to quality.

Speed

The use of powerful algorithms in journalism would enable news to be spread much faster. The moment the software receives data, it can begin to formulate an article surrounding the topic, capturing the important information that is relevant to readers. This eliminates barriers or time sinks associated with human production of content. Important information can be generated and outputted within minutes after an event occurs. Increasing the speed and flow of information benefits readers ability to understand and remain updated about various situations.

Automating manual and menial journalistic processes enables journalists to dedicate more time to producing higher quality content. Software is able to analyze and sort through large quantities of information at rates exponentially faster than their human counterparts.[18] Removing the need of human intervention for tasks like calculations and other low level analyses, allows journalists to dedicate time to the more complex aspects of journalism. With the assistance of software, human journalists would be able to develop a much more in depth analysis of their data.[19] Pairing journalistic skills with the quality of data extrapolated by computers, would enable journalists to create much more meaningful conclusions and insights from their work. Thus, many supporters of AI journalism don't believe that AI will kill the journalism field and role of human interaction. Instead, they believe AI will be the future of the field, and the next step for journalists[20]. Proponents argue that AI-based tools will never replace human journalists, but will instead shift their responsibilities and empower their ability to produce more in depth content.[21]

Ethical Concerns

AI content generation algorithms that are implemented irresponsibly, have the potential for creating certain ethical issues.

Accuracy and the Weaponization of Disinformation

An ethical concern with AI journalism is the accuracy of generated content. Current AI models are great at generating slick language that reads as if a human wrote it. However, AI is unable to explain its output, and the process it did to get there. As a result, the AI models have no real understanding of what they are generating, and they state both facts and falsehoods with the same high level of confidence.[22]. An often cited example is CNET's automated AI journalist, whose articles were riddled with inaccuracies and misleading information that the algorithm was unable to detect when generating articles. Similarly, ChatGPT, a powerful AI language model, is capable of generating false sources to support answers it is unsure of when addressing user questions.[23] .

Misinformation has the potential for negative societal implications, and poses an ethical challenge, particularly when spread maliciously. It has the potential to polarize public opinion, promote extremism and undermine trust in democratic processes. [24] The 2016 elections are cited as a prominent example of the dangers of misinformation, with Mark Zuckerberg admitting that 126 million Americans were shown Russian-backed, politically-oriented fake news stories via Facebook during the 2016 US presidential election campaign.[25] The misinformation that was spread had a polarizing and potentially significant impact on American public opinion during the election period. By training algorithms on false data and tuning output towards misinformation, bad actors are capable of using AI to generate misleading information. As time progresses, the cost of producing AI algorithms reduces, meaning advanced technologies will no longer be limited to large companies that invest a lot into development.[26] As AI technology becomes more accessible, it's potential to be used to generate disinformation increases. Those with malicious intent, will have the capabilities to produce and spread misleading information at large scales. Disinformation drowns out and reduces the impact of real news by competing with it. The weaponization of false information, also undermines the credibility of real media coverage, and spreads information that can lead to a variety of issues, like political polarization, and manipulation of public opinion[27].

Future of Journalism

Employment of Human Journalists

Another concern presented by opponents of AI journalism, is the potential impact on human employment in the field. Over the last decade, the field of journalism and number of professional editors in the field has already experienced a large decline. [28] Introducing increased levels of automation into the field is feared to exacerbate this issue and remove more individuals from these positions. As companies seek to minimize costs, the transition to AI journalists presents an effective way to reduce costs. Positions like data collection, analysis, and low level fact checking, where computers can perform better than humans, would experience shifts in the workforce makeup. News outlets may use AI journalists as a means to promote predatory tactics towards human journalists.

Increasing Power Gaps in Journalism

Developing AI technology, particularly those that are cutting edge, are expensive and resource intensive. Large news and media companies are able to invest in such technologies, but smaller outlets are limited to avoiding the technology, or licensing software from other companies. This could potentially reduce the control smaller outlets have over the content produced by the software, making it more difficult for them to compete in the space, limiting the spread of multiple sources of media

Authorship

Identifying the Author

Another criticism of AI journalism is the challenge surrounding authorship. The introduction of automation makes it difficult to assign a creator to the piece. Journalism is often considered an art form, and automation takes away from that. [29] When a computer generates a piece of content, it is challenging to identify who should be listed as its creator. It's not trivial to identify whether the developer of the algorithm, the humans working alongside the software, or the authors of the pieces on which the AI system was trained on should be credited. This issue removes transparency from the journalistic process, and creates difficulty in identifying the parties responsible for a piece.

Copyright and Fair Use

Like many new technologies, AI journalists may potentially challenge copyright laws[30]. AI content generators often gather data from various sources on the internet. The information an algorithm gathers might be accurate, but it doesn't necessarily mean the algorithm had the right to collect and integrate that data in the first place.[31].

Algorithmic Bias

As explained earlier, designing AI systems is a rigorous process, and requires strong control over each step of the process. Programmers design the underlying algorithms, select and process the data to train the algorithms on, and determine how to apply the results of the algorithm to create the AI system. This finely tuned process, is what enables the objective and data driven decisions that AI can make. However, when not carefully supervised, it can allow bias to enter the system. Bias can emerge from many unanticipated factors, such as the design of the algorithm or the decisions relating to the way data was coded, and collected to train the algorithm. [32] In computer science, there's a principle know as garbage in, garbage out; the concept that flawed input produces flawed output. When this principle is violated when training an algorithm, biases from the training data can unwittingly be baked into the algorithms decision making process. In the scope of AI journalism, this means that content generation algorithms can become programmed to take biased views on topics, if they were trained on data that was biased itself. [33]

Apart from poor data, the creator of the algorithm themselves can have a large impact on the biases an algorithm adopts. Without diverse teams and rigorous testing, it is easy for an individual or a small group of developers to let subtle, unconscious biases enter their models, which AI then automates and perpetuates at large scales.[34]. When biases are present in poorly created algorithms, while they may be unintended, they can have serious consequences. AI journalism, just like normal journalism, has a broad scope, as society as a whole takes in the media. As a result, poorly created AI journalists, can create potential ramifications at a societal level by introducing media bias. Media bias is dangerous, and when spread widely enough, it can lead to inequitable outcomes for societies most vulnerable groups. [35] Because of how certain races and ethnic groups were treated in the past, data can often contain hidden biases that influences the decision making capabilities of algorithms. Media bias is generally negative, as biased information can lead to negative social outcomes such as group polarization, intolerance of dissent, and reduced efficiency in collective decision making [36].

The efficiency of algorithmic content generation provides the tools for bias to spread rapidly and far if AI models are created irresponsibly and without the appropriate checks in place. As a result, it's important for algorithms used for media generation to be trained on the right data, and incorporate a diverse set of views to limit the effects of bias on generated news.

References

  1. Kotenidis, E.; Veglis, A. Algorithmic Journalism—Current Applications and Future Perspectives. Journal. Media. 2021, 2, 244-257. https://doi.org/10.3390/journalmedia2020014
  2. TowCenter. (2019). The State of Automated Journalism in Newsrooms | Guide to Automated Journalism. Gitbooks.io. https://towcenter.gitbooks.io/guide-to-automated-journalism/content/status_quo_of/the_state_of.html
  3. Babak. (2015, March 24). The AI Resurgence: Why Now? WIRED; WIRED. https://www.wired.com/insights/2015/03/ai-resurgence-now/
  4. Flam, F. (2022, December 28). Why the Future of Technology Is So Hard to Predict. Washington Post; The Washington Post. https://www.washingtonpost.com/business/energy/why-the-future-of-technology-is-so-hard-to-predict/2022/12/28/57fd3ac2-86b0-11ed-b5ac-411280b122ef_story.html
  5. I, Robot. You, Journalist. Who is the Author? (2017). Digital Journalism. https://www.tandfonline.com/doi/abs/10.1080/21670811.2016.1209083?journalCode=rdij20
  6. The history of natural language generation. Retrieved February 10, 2023, from 'https://medium.com/@AutomatedInsights/the-history-of-natural-language-generation-5b4c3fa2f9f9'
  7. Bosker, B. (2013, February 11). Philip Parker’s Trick For Authoring Over 1 Million Books: Don’t Write. HuffPost; HuffPost. https://www.huffpost.com/entry/philip-parker-books_n_2648820
  8. Srdjan Becanovic. (2021, December 13). What is machine learning & AI training data? StageZero Technologies. https://stagezero.ai/blog/what-is-training-data/#:~:text=Artificial%20Intelligence%20(AI)%20and%20machine,the%20full%20potential%20of%20AI.
  9. How Much Data Is Needed For Machine Learning? | Graphite Note. (2022, December 15). Graphite Note. https://graphite-note.com/how-much-data-is-needed-for-machine-learning#:~:text=Generally%20speaking%2C%20the%20rule%20of,100%20rows%20for%20optimal%20results.
  10. Kaput, M. (2022, September 26). Natural Language Generation (NLG): Everything You Need to Know. Marketingaiinstitute.com; Marketing AI Institute. https://www.marketingaiinstitute.com/blog/the-beginners-guide-to-using-natural-language-generation-to-scale-content-marketing
  11. Vargas, R. (2022, January 14). The Increase in Computer Power is Driving Applied AI. Encora; Encora. https://www.encora.com/insights/the-increase-in-computer-power-is-driving-applied-ai#:~:text=We've%20seen%20an%20increase,data%20into%20their%20training%20processes
  12. Moses, L. (2017, September 14). The Washington Post’s robot reporter has published 850 articles in the past year. Digiday; Digiday. https://digiday.com/media/washington-posts-robot-reporter-published-500-articles-last-year/
  13. Landymore, F. (2023, January 11). CNET Is Quietly Publishing Entire Articles Generated By AI. Futurism; Futurism. https://futurism.com/the-byte/cnet-publishing-articles-by-ai
  14. https://www.engadget.com/about/editors/mariella-moon. (2023). CNET is reviewing its AI-written articles after being notified of serious errors | Engadget. Engadget; Engadget. https://www.engadget.com/cnet-reviewing-ai-written-articles-serious-errors-113041405.html
  15. Christian, J. (2023, January 23). CNET’s AI Journalist Appears to Have Committed Extensive Plagiarism. Futurism; Futurism. https://futurism.com/cnet-ai-plagiarism
  16. Shearer, E., & Tomasik, E. (2022, October 13). After increasing in 2020, layoffs at large U.S. newspapers and digital news sites declined in 2021. Pew Research Center; Pew Research Center. https://www.pewresearch.org/fact-tank/2022/10/13/after-increasing-in-2020-layoffs-at-large-u-s-newspapers-and-digital-news-sites-declined-in-2021/
  17. Wertheim, J. (2022, June 12). Local newsrooms strained by budget-slashing financial firms - 60 Minutes. Cbsnews.com; CBS News. https://www.cbsnews.com/news/local-news-financial-firms-60-minutes-2022-06-12/
  18. From Pink Slips to Pink Slime: Transforming Media Labor in a Digital Age. (2015). The Communication Review. https://www.tandfonline.com/doi/abs/10.1080/10714421.2015.1031996?journalCode=gcrv20
  19. Greene, T. (2017, July 28). Face it, AI is better at data-analysis than humans. TNW | Artificial-Intelligence; The Next Web. https://thenextweb.com/news/face-it-ai-is-better-at-data-analysis-than-humans
  20. AI enters the newsroom. (2023). Nieman Lab. https://www.niemanlab.org/2022/12/ai-enters-the-newsroom/
  21. AI enters the newsroom. (2023). Nieman Lab. https://www.niemanlab.org/2022/12/ai-enters-the-newsroom/
  22. https://www.engadget.com/about/editors/mariella-moon. (2023). CNET is reviewing its AI-written articles after being notified of serious errors | Engadget. Engadget; Engadget. https://www.engadget.com/cnet-reviewing-ai-written-articles-serious-errors-113041405.html
  23. Mahadevan, A. (2023, February 3). This newspaper doesn’t exist: How ChatGPT can launch fake news sites in minutes - Poynter. Poynter. https://www.poynter.org/fact-checking/2023/chatgpt-build-fake-news-organization-website/
  24. Dealing with propaganda, misinformation and fake news - Democratic Schools for All - publi.coe.int. (2014). Democratic Schools for All. https://www.coe.int/en/web/campaign-free-to-speak-safe-to-learn/dealing-with-propaganda-misinformation-and-fake-news#:~:text=Propaganda%2C%20misinformation%20and%20fake%20news%20have%20the%20potential%20to%20polarise,trust%20in%20the%20democratic%20processes.
  25. The Danger of Fake News in the 2016 Election | Center for Information Technology and Society - UC Santa Barbara. (2016). Ucsb.edu. https://www.cits.ucsb.edu/fake-news/danger-election
  26. What Changes When AI Is So Accessible That Everyone Can Use It? (2018, January 30). Harvard Business Review. https://hbr.org/2018/01/what-changes-when-ai-is-so-accessible-that-everyone-can-use-it
  27. Heather. (2021, November). What Are The Dangers of Fake News? | The Risk of Fake News. PeoplesBank. https://www.peoplesbanknet.com/the-dangers-of-fake-news/
  28. Edmonds, R. (2015, July 28). Newspaper industry lost 3,800 full-time editorial professionals in 2014 - Poynter. Poynter. https://www.poynter.org/reporting-editing/2015/newspaper-industry-lost-3800-full-time-editorial-professionals-in-2014/
  29. Haywood, N. (2018, February 23). Four States Living Magazine. Four States Living Magazine. https://www.fourstatesliving.com/feature-stories/2018/1/31/journalism-is-an-art-form#:~:text=The%20art%20of%20storytelling%20is,discovered%20early%20on%20in%20life.
  30. Stefan Brambilla Hall. (2018, January 15). 7 challenges for AI in journalism. World Economic Forum. https://www.weforum.org/agenda/2018/01/can-you-tell-if-this-article-was-written-by-a-robot-7-challenges-for-ai-in-journalism/
  31. amlacey. (2015, October 20). Ethics of robot journalism: How Automated Insights poses issues for data collection and writing. Center for Journalism Ethics. https://ethics.journalism.wisc.edu/2015/10/20/ethics-of-robot-journalism-how-automatedinsights-poses-issues-for-data-collection-and-writing/
  32. A Simple Tactic That Could Help Reduce Bias in AI. (2020, November 4). Harvard Business Review. https://hbr.org/2020/11/a-simple-tactic-that-could-help-reduce-bias-in-ai
  33. PricewaterhouseCoopers. (2022). Understanding algorithmic bias and how to build trust in AI: PwC. PwC. https://www.pwc.com/us/en/tech-effect/ai-analytics/algorithmic-bias-and-trust-in ai.html#:~:text=The%20short%20answer%3A%20People%20write,AI%20then%20automates%20and%20perpetuates.
  34. PricewaterhouseCoopers. (2022). Understanding algorithmic bias and how to build trust in AI: PwC. PwC. https://www.pwc.com/us/en/tech-effect/ai-analytics/algorithmic-bias-and-trust-in-ai.html#:~:text=The%20short%20answer%3A%20People%20write,AI%20then%20automates%20and%20perpetuates.
  35. Day 4: Understanding Our Bias and its Consequences - United Way for Southeastern Michigan. (2017). United Way for Southeastern Michigan. https://unitedwaysem.org/equity_challenge/day-4-understanding-our-bias-the-consequences-of-bias/#:~:text=Bias%20can%20be%20dangerous%20and,influence%20actions%20that%20are%20discriminatory
  36. Dominic Spohr. 2017. Fake news and ideological polarization: Filter bubbles and selective exposure on social media. Business Information Review 34, 3 (2017), 150–160.