프로그래밍/AI_DeepLearning

"2024년 국내 생성형 AI 스마트폰 시장, 빠르게 성장 중" 한국IDC 2024.09.26 1
프로그래밍에서 AI가 대체하지 못하는 것들 2024.09.14
AI 코딩 오류, 관리는 인간 프로그래머가 담당해야 2024.09.13 1
오픈AI “챗GPT 주간활성사용자 2억명”...1년새 100% 성장 2024.09.02 1
이전학습을 기억 못하는 AI 2024.08.30
개발 시간을 절반으로 단축하는 25가지 오픈 소스 AI 도구 2024.08.27

"2024년 국내 생성형 AI 스마트폰 시장, 빠르게 성장 중" 한국IDC

홍반장水_ 2024. 9. 26. 09:39

2024. 9. 26. 09:39

올해 2분기 국내 스마트폰 시장의 출하량이 약 299만대로 전년 대비 6.8% 성장했다고 IDC가 25일 밝혔다. 경제 불확실성으로 전체 스마트폰 시장의 수요가 위축되고 있는 상황이지만 AI 기능을 탑재한 플래그십 스마트폰의 높은 수요가 지속되고 있다는 분석이다.

이 시장조사기관에 따르면 800달러(USD)이상의 플래그십 제품군의 점유율이 전년 동기 대비 5.3%p 증가한 62.3%를 기록했다. 실시간 번역, 텍스트 요약, 간단해진 검색 등의 AI기술이 별도의 앱 설치 없이 기본 기능에 적용되며 복잡한 과정 없이 이용할 수 있다는 점이 소비자들의 관심을 유발했다는 설명이다.

이 밖에 주요 브랜드의 5G 플래그십 및 중저가 스마트폰이 출시로 인해 5G 점유율은 89.1%로 상승한 것으로 나타났다.

반면, 국내 폴더블 시장은 전년 대비 크게 감소한 약 6만대를 출하했다. 삼성전자가 3분기 폴더블 신제품 출시를 앞둔 가운데 AI 기능이 적용될 신제품의 기대감이 고조되며 일부 대기 수요가 발생했고 이로 인해 지난 분기에 이어 시장 수요가 급감한 것으로 IDC는 분석했다.

한국IDC에서 모바일폰 시장 리서치를 담당하고 있는 강지해 연구원은 “온디바이스AI 열풍이 가속화되며 시장 경쟁이 치열해지고 있고 전반적인 스마트폰 시장 내 흐름이 AI 스마트폰으로 완전히 옮겨가고 있다. 국내 생성형 AI 스마트폰 2024년 연간 출하량은 950만대를 밑돌 것으로 전망된다”라고 말했다.

IDC가 정의하는 생성형 AI 스마트폰은 정수형식의 8비트 데이터(int-8)를 사용하여 초당 30 TOPS(Tera Operations per Second) 이상의 성능을 갖춘 신경 처리 장치(NPU)를 활용해 온디바이스 GenAI를 보다 빠르고 효율적으로 실행할 수 있는 칩셋(SoC)을 탑재한 모델이다. https://www.ciokorea.com/news/351117

저작자표시 비영리 (새창열림)

'프로그래밍 > AI_DeepLearning' 카테고리의 다른 글

AI 시대의 기술 위축을 피하는 방법 (0)	2025.05.07
‘수학방정식’으로 AI를 속이다··· 생성형 AI 무력화하는 新 공격 기법 ‘매쓰프롬프트’란? MathPrompt (2)	2024.09.26
프로그래밍에서 AI가 대체하지 못하는 것들 (0)	2024.09.14
AI 코딩 오류, 관리는 인간 프로그래머가 담당해야 (1)	2024.09.13
오픈AI “챗GPT 주간활성사용자 2억명”...1년새 100% 성장 (1)	2024.09.02

프로그래밍에서 AI가 대체하지 못하는 것들

홍반장水_ 2024. 9. 14. 16:47

2024. 9. 14. 16:47

생성형 AI가 지루한 작업을 처리하고 오류를 찾는 데 능숙하더라도 프로그래머의 전문성과 직관은 항상 필요할 것이다.

데이터셋(Datasette)의 설립자 사이먼 윌리슨은 “지금이 프로그래밍을 배우기에 더할 나위 없이 좋은 시기”라고 말했다. AI가 코딩을 대신 해줘서가 아니다. 사실 정반대다. 그는 “대규모 언어 모델은 학습 곡선을 평평하게 만들어 젊은 개발자가 더 쉽게 따라잡을 수 있게 해준다”라고 말했다. 코딩하는 방법을 잊어서는 안 되지만, 생성형 AI를 사용해 경력 수준에 관계없이 개발자 경험을 강화할 수 있다.

‘배움에 대한 의지’를 예찬
필자는 생성형 AI에 대한 윌리슨의 견해를 살피는 것을 즐긴다. 그는 이 주제를 사려 깊게 생각하는 개발자다. 오라일리(O'Reilly Media)의 마이크 루키데스 글도 큰 주제에서 핵심을 압축해 설명했기 때문에 읽어볼 만하다. 루키데스는 생성형 AI와 코딩에 대해 “정말 좋은 프롬프트를 작성하기란 생각보다 어렵다”라는 점을 상기시켜 준다. 그는 “프롬프트를 잘 작성하려면 프롬프트의 목적에 대한 전문 지식을 쌓아야 한다”라고 말했다. 다시 말해, 먼저 ‘좋은’ 프로그래머가 돼야 한다.

루키데스는 “AI를 '인간이 얻을 수 없는 전문 지식과 지혜의 보고’로 생각해버리면 이를 생산적으로 사용할 수 없게 된다”라고 조언했다. AWS 코드위스퍼러(CodeWhisperer)나 구글 코디(Codey)와 같은 도구를 효과적으로 사용하기 위해서는 기대하는 결과물을 코칭해야 한다. 그리고 AI에게 개발 문제를 해결하는 방법을 단계별로 알려주려면, 먼저 문제를 깊이 이해하고 AI가 응답하도록 이끌어내야 한다.

또한 개발자는 AI가 틀렸을 때 이를 평가할 수 있어야 한다. 여기엔 일정 수준의 전문성이 필요하다. 윌리슨이 언급한 것처럼 코딩 어시스턴트가 프로젝트에서 더 활발히 일하고 도와줄 것으로 기대되는 상황이지만, 그렇다고 해서 개발자가 코드를 파악해야 할 필요성까지 없애주진 않을 것이다. 그렇게 되기를 바라는 이도 없을 것이다. 다시 윌리슨의 첫 번째 요점으로 돌아가 본다.

AI를 활용한 코딩 학습
특정 언어, 프레임워크, 데이터베이스 등을 처음 접하는 개발자라면 학습 곡선이 가파를 수 있다. 예를 들어 “세미콜론을 놓쳐서 기이한 오류 메시지가 표시되고, 그 오류를 다시 찾는 데 2시간이 걸리는 경우도 있다”라고 윌리슨은 말했다. 당연히 이러한 점 때문에 학생들은 자신이 프로그래밍을 배울 만큼 똑똑하지 않다고 생각해 배움을 포기할 수 있다.

바로 이 부분에서 AI 어시스턴트가 개입할 수 있다. 윌리슨은 “컴퓨터공학 학위가 없어도 컴퓨터가 지루한 일을 대신 해줄 수 있어야 한다”라고 전했다. 챗GPT 같은 LLM 기반 어시스턴트는 지루한 작업을 자동화할 수 있다. 깃허브(GitHub) 엔지니어 자나 도건은 “사람들은 코드 생성에만 너무 집중한 나머지 LLM이 코드 분석에 유용하다는 사실을 완전히 잊고 있다”라고 강조했다. 모든 작업을 AI가 할 필요는 없다. 윌리슨의 주장에 따르면, 애플리케이션을 만들거나 망치지는 않으나 개발자의 자신감을 떨어뜨릴 수 있는, 개별적이고 지루한 작업을 자동화하는 데 AI를 활용할 수 있다. 코딩 어시스턴트가 지루한 작업을 처리할 수 있음에도 개발자가 프로그래밍의 모든 측면을 배우고 수행할 것을 요구받는 경우에 더 그렇다.

언제나 그렇듯 생성형 AI와 함께 소프트웨어 개발을 시작하는 가장 좋은 방법은, 바로 시작하는 것이다. 이해는 했지만 반복해서 작성할 필요는 없는 간단한 작업부터 자동화해 작게 시작하라. 이렇게 절약한 시간으로 더 까다로운 코딩 문제를 해결하는 방법을 배우는 데 집중할 수 있다. 전문성이 높아지면 이러한 작업도 자동화할 수 있게 될 것이다.

https://www.ciokorea.com/news/311336

칼럼 | 프로그래밍에서 AI가 대체하지 못하는 것들

생성형 AI가 지루한 작업을 처리하고 오류를 찾는 데 능숙하더라도 프로그래머의 전문성과 직관은 항상 필요할 것이다. ⓒ Getty

www.ciokorea.com

저작자표시 비영리 (새창열림)

'프로그래밍 > AI_DeepLearning' 카테고리의 다른 글

‘수학방정식’으로 AI를 속이다··· 생성형 AI 무력화하는 新 공격 기법 ‘매쓰프롬프트’란? MathPrompt (2)	2024.09.26
"2024년 국내 생성형 AI 스마트폰 시장, 빠르게 성장 중" 한국IDC (1)	2024.09.26
AI 코딩 오류, 관리는 인간 프로그래머가 담당해야 (1)	2024.09.13
오픈AI “챗GPT 주간활성사용자 2억명”...1년새 100% 성장 (1)	2024.09.02
이전학습을 기억 못하는 AI (0)	2024.08.30

AI 코딩 오류, 관리는 인간 프로그래머가 담당해야

홍반장水_ 2024. 9. 13. 13:17

2024. 9. 13. 13:17

생성형 AI를 도입한 소프트웨어 개발 작업에 인간 프로그래머와는 근본적으로 다른 실수가 포함된다는 사실은 잘 알려져 있다. 그럼에도 대부분의 기업에서 AI 코딩 실수를 수정하는 계획은 단순히 숙련된 인간 프로그래머를 루프에 투입하는 것에 의존하고 있다.

숙련된 인간 프로그래머는 인간 프로그래머가 저지르는 실수와 지름길의 종류를 직관적으로 알고 있다. 하지만 소프트웨어가 소프트웨어를 만들 때 발생하는 실수의 종류를 찾아내는 훈련은 별도로 필요하다.

이러한 논의는 이르면 2026년부터 대부분의 개발자가 더 이상 코딩을 하지 않을 것으로 예상한다는 AWS CEO 매트 가먼의 발언으로 더욱 가속화되었다.

개발 도구 분야의 많은 업체는 AI 코딩 앱을 관리하기 위해 AI 앱을 사용하면 이 문제를 해결할 수 있다고 주장했다. 2번째 열차 사고의 신호탄이나 마찬가지다. 금융 대기업인 모건 스탠리조차도 AI를 사용해 AI를 관리하는 방법을 고민하고 있다.

현실적으로 안전하고 원격으로 실행 가능한 유일한 접근 방식은 생성형 AI 코딩 오류의 특성을 이해하도록 프로그래밍 관리자를 교육하는 것이다. 사실 AI 코딩 오류의 특성이 매우 다르다는 점을 고려할 때, 인간의 코딩 실수를 발견하는 데 익숙하지 않은 새로운 사람을 AI 코딩 관리자로 교육하는 것이 더 나을 수도 있다.

문제의 일부는 인간의 본성이다. 사람들은 차이를 확대하고 잘못 해석하는 경향이 있다. 관리자는 자신이 절대 하지 않을 실수를 사람이나 AI가 저지르는 것을 보면 그 실수가 코딩 문제에서 관리자보다 열등하다고 생각하는 경향이 있다.

하지만 자율 주행 차량에 비추어 가정해 보자. 통계적으로 자율주행차는 사람이 운전하는 자동차보다 훨씬 더 안전하다. 자동화된 시스템은 피로를 느끼지도 않고, 취하지도 않으며, 고의적으로 난폭해지지도 않는다.

하지만 자율주행차는 완벽하지 않다. 그리고 교통 체증으로 정차한 트럭을 전속력으로 들이받는 등의 실수를 저지르면 인간은 “나라면 저런 멍청한 짓은 절대 하지 않았을 텐데...인공지능을 믿을 수 없어”라고 반문하게 된다. (웨이모 주차 차량 참사는 꼭 봐야 할 동영상이다.)

하지만 자율주행차가 이상한 실수를 한다고 해서 인간 운전자보다 안전하지 않다는 의미는 아니다. 그러나 인간의 본성은 이러한 차이를 조정할 수 없다.

코딩 관리도 마찬가지다. 생성형 AI 코딩 모델은 매우 효율적일 수 있지만, 자칫 잘못하면 엉뚱한 방향으로 흘러갈 수 있다.

AI는 미친 외계인 프로그래머

SaaS 기업 쿼리팰(QueryPal) CEO인 데브 내그는 생성형 AI 코딩 작업을 해오면서 많은 기업 IT 경영진이 이 새로운 기술이 얼마나 다른지에 대해 준비가 되어 있지 않다고 느꼈다.

내그는 “마치 다른 행성에서 온 외계인처럼 이상한 실수를 많이 했다. 인간 개발자가 하지 않는 방식으로 코드가 잘못 작동한다. 마치 우리처럼 생각하지 않는 외계 지능처럼 이상한 방향으로 나아간다. AI는 병적으로 시스템을 조작할 방법을 찾아낼 것”이라고 말했다.

올해 ‘AI 보조 프로그래밍’을 포함해 여러 권의 AI 프로그래밍 책을 펴낸 톰 타울리에게 물어보자.

타울리는 “예를 들어 LLM에 코드 작성을 요청할 수 있으며, 때로는 원하는 작업을 수행하기 위해 프레임워크나 가상의 라이브러리 또는 모듈을 구성할 수도 있다”라고 말했다. (타울리는 LLM이 실제로는 새로운 프레임워크를 만드는 것이 아니라 그렇게 하는 척하는 것이라고 설명했다.)

타울리는 “(인간 프로그래머가) 미치지 않는 한, 가상의 라이브러리나 모듈을 만들어서 허공에서 만들어내지는 않을 것”이라고 지적했다.

이런 일이 발생하면 누구든 찾아보면 쉽게 발견할 수 있다. 타울리는 “직접 설치하려고 하면 아무것도 없다는 것을 알 수 있다. 이 경우 IDE와 컴파일러에서 오류가 발생한다"라고 설명했다.

실행 파일의 창의적인 제어를 포함해 애플리케이션 전체 코딩을 주기적으로 환각을 일으키는 시스템에 넘긴다는 생각은 끔찍한 접근 방식인 것 같다.

생성형 AI 코딩의 효율성을 활용하는 훨씬 더 좋은 방법은 프로그래머가 더 많은 작업을 수행할 수 있도록 돕는 도구로 사용하는 것이다. AWS의 가먼이 제안한 것처럼 인간을 배제하는 것은 자살 행위나 다름없다.

만약 생성형 AI 코딩 도구가 마음대로 돌아다니면서 백도어를 만들어 나중에 사람을 귀찮게 하지 않고도 수정할 수 있도록 한다면 공격자들도 사용할 수 있는 백도어를 만들면 어떨까?

기업은 앱, 특히 자체 개발한 앱의 기능을 테스트해 앱이 제대로 작동하는지 확인하는 데 매우 효과적인 경향이 있다. 앱 테스트가 실패하기 쉬운 부분은 앱이 수행해서는 안 되는 작업을 수행할 수 있는지 확인하는 경우이다. 이것이 바로 모의 침투 테스트 사고방식이다.

하지만 생성형 AI 코딩 현실에서는 이러한 펜 테스트 방식이 기본이 되어야 한다. 또한 생성형 AI의 실수라는 엉뚱한 세계에 대해 잘 교육받은 감독자가 이를 관리해야 한다.

기업 IT는 확실히 더 효율적인 코딩 미래를 기대하고 있다. 프로그래머는 앱이 무엇을 해야 하는지, 왜 해야 하는지에 더 집중하고 모든 줄을 힘들게 코딩하는 데 시간을 덜 할애하여 더 전략적인 역할을 맡을 것이다.

하지만 그러한 효율성과 전략적 이득은 막대한 대가를 치러야 한다. AI가 생성한 코드가 올바른 방향으로 나아가도록 하기 위해 더 뛰어나고 다르게 훈련된 인력을 고용해야 하기 때문이다.

https://www.itworld.co.kr/topnews/350221

AI 코딩 오류, 관리는 인간 프로그래머가 담당해야

생성형 AI를 도입한 소프트웨어 개발 작업에 인간 프로그래머와는 근본적으로 다른 실수가 포함된다는 사실은 잘 알려져 있다. 그럼에도 대부분의 기

www.itworld.co.kr

저작자표시 비영리 (새창열림)

'프로그래밍 > AI_DeepLearning' 카테고리의 다른 글

"2024년 국내 생성형 AI 스마트폰 시장, 빠르게 성장 중" 한국IDC (1)	2024.09.26
프로그래밍에서 AI가 대체하지 못하는 것들 (0)	2024.09.14
오픈AI “챗GPT 주간활성사용자 2억명”...1년새 100% 성장 (1)	2024.09.02
이전학습을 기억 못하는 AI (0)	2024.08.30
개발 시간을 절반으로 단축하는 25가지 오픈 소스 AI 도구 (0)	2024.08.27

오픈AI “챗GPT 주간활성사용자 2억명”...1년새 100% 성장

홍반장水_ 2024. 9. 2. 10:14

2024. 9. 2. 10:14

오픈AI는 챗GPT(ChatGPT)의 주간 활성 사용자가 2억 명을 돌파했다고 밝혔다. 이는 지난해보다 두 배 증가한 수준이다.

39일 악시오스에 따르면, 포춘 500대 기업 중 92%가 오픈AI 제품을 사용하고 있다. 또 GPT-4o 미니(mini)가 올 7월에 출시된 이후 자동화 API 사용량이 두 배 증가했다.

샘 올트먼 오픈AI 최고경영책임자(CEO)는 “사람들이 우리의 도구를 이제 일상적으로 사용하고 있으며, 이는 의료 및 교육과 같은 분야에서 실질적인 변화를 가져오고 있다”며 “일상적인 업무 지원부터 어려운 문제 해결, 창의성 발현까지 다양한 영역에서 도움을 주고 있다”고 말했다.

오픈AI는 생성형 AI 챗봇 시장에서 선두 자리를 유지하고 있다. 하지만 테크 기업들이 점유율을 높이고자, 서비스를 업데이트하면서 경쟁 격화에 노출된 상태다.

이날 메타(Meta)는 오픈 소스 라마(Llama) 모델의 도입이 급격히 증가했다고 밝혔다. 라마(Llama) 3.1 출시 이후 올해 5월과 7월 사이 주요 클라우드 서비스 제공업체에서의 사용량이 두 배 증가했다는 것이 회사측 설명이다.

마이크로소프트, 구글, 오픈AI, 메타 간 사용자 확보 경쟁은 더욱 치열해질 전망이다.

https://www.mk.co.kr/news/it/11105925

저작자표시 비영리 (새창열림)

'프로그래밍 > AI_DeepLearning' 카테고리의 다른 글

프로그래밍에서 AI가 대체하지 못하는 것들 (0)	2024.09.14
AI 코딩 오류, 관리는 인간 프로그래머가 담당해야 (1)	2024.09.13
이전학습을 기억 못하는 AI (0)	2024.08.30
개발 시간을 절반으로 단축하는 25가지 오픈 소스 AI 도구 (0)	2024.08.27
국방부, 민간 클라우드로 네이버 선택…메가존이 시스템 구축 (0)	2024.08.26

이전학습을 기억 못하는 AI

홍반장水_ 2024. 8. 30. 10:03

2024. 8. 30. 10:03

캐나다 알버타대 연구진이 최근 ‘인공 신경망’의 한계를 극복하는 방안을 제안하는 논문을 네이처에 발표했습니다. 연구 결과보다 논문에서 정리한 인공 신경망의 한계 부분이 더 눈길을 끌었는데요, 이를 짧게 정리해 보겠습니다.

‘신경망’이라는 단어 들어보셨죠? 인간의 두뇌에서 영감을 얻은 일종의 시스템인데요, LLM이 이러한 신경망을 기반으로 구축됐습니다. 신경망은 마치 뇌의 ‘뉴런’이 연결된 것처럼 입력된 데이터를 여러 단계를 거쳐 가중치를 기반으로 답을 내놓는 방식입니다. 뉴런 간의 연결이 탄탄하고 많을수록 뇌 기능이 뛰어나다고 하듯이, 신경망 또한 마찬가지입니다.

신경망에는 입력과 출력 사이에 ‘은닉층’이라는 것이 있는데요, 이곳에서 많은 데이터를 학습하고 계산을 열심히 할수록 좋은 데이터가 나옵니다. 물론 이는 단순화한 설명입니다. 너무 많은 정보를 한 번에 공부하면 뇌에 과부하가 오듯이 은닉층을 늘리기만 하면 오히려 계산이 느려질 수 있다고 해요.

신경망, 정확히 얘기하면 인공 신경망은 이후 머신러닝 분야에서 활발히 적용되고 있습니다. 신경망이 가진 한계도 있습니다. 뇌를 본떴다고는 하지만 생물학적인 뇌와 기계적인 신경망이 같을 리 없는데요, 특히 지속 학습 과정에서 신경망이 가진 단점이 보고되고 있어요.

인간은 이전에 습득한 정보, 지식을 지우지 않고도 새로운 정보에 효과적으로 적응하고 대응할 수 있습니다. 생물체의 신경망은 과거의 데이터를 기억하는 능력, 즉 ‘안정성’과 새로운 개념을 학습하는 능력, ‘가소성’ 사이에서 균형을 찾으면서 학습해 갑니다.

하지만 인공 신경망은 새로운 과제를 학습해야 하는 상황에 직면했을 때 이전에 학습했던 능력을 상실하는 ‘치명적 망각(catastrophic forgetting)’에 취약하다고 해요. 심지어 심할 경우 신경망 자체가 학습 능력을 잃어버린다고 합니다.

알버타대학 연구진의 비유를 볼게요. ‘퐁(Pong)’이라 불리는 비디오게임이 있습니다. 마치 탁구를 하듯 양쪽에서 공을 주고받는 게임인데요, 퐁에서 좋은 성적을 내도록 신경망을 학습시킨 뒤 비행기 게임 ‘갤러그’를 학습시키면 퐁에서의 점수가 크게 하락합니다. 새롭게 학습하는 게임이 많아질수록 처음 학습한 게임 방법을 거의 잃어버리게 됩니다.

저작자표시 비영리 (새창열림)

'프로그래밍 > AI_DeepLearning' 카테고리의 다른 글

AI 코딩 오류, 관리는 인간 프로그래머가 담당해야 (1)	2024.09.13
오픈AI “챗GPT 주간활성사용자 2억명”...1년새 100% 성장 (1)	2024.09.02
개발 시간을 절반으로 단축하는 25가지 오픈 소스 AI 도구 (0)	2024.08.27
국방부, 민간 클라우드로 네이버 선택…메가존이 시스템 구축 (0)	2024.08.26
HyperCLOVA X Vision: 눈을 뜨다 (2)	2024.08.26

개발 시간을 절반으로 단축하는 25가지 오픈 소스 AI 도구

홍반장水_ 2024. 8. 27. 08:52

2024. 8. 27. 08:52

개발 시간을 절반으로 단축하는 25가지 오픈 소스 AI 도구

25 Open Source AI Tools to Cut Your Development Time in Half

https://jozu.com/blog/25-open-source-ai-tools-to-cut-your-development-time-in-half/

25 Open Source AI Tools to Cut Your Development Time in Half - Jozu MLOps

Discover 25 open-source tools to streamline your AI projects from development to production.

jozu.com

Each ML/AI project stakeholder requires specialized tools that efficiently enable them to manage the various stages of an ML/AI project, from data preparation and model development to deployment and monitoring. They tend to use specialized open source tools because of their contribution as a significant catalyst to the advancement, development, and ease of AI projects. As a result, numerous open source AI tools have emerged over the years, making it challenging to pick from the available options.

This article highlights some factors to consider when picking open source tools and introduces you to 25 open-source options that you can use for your AI project.

Picking open source tools for AI project

The open source tooling model has allowed companies to develop diverse ML tools to help you handle particular problems in an AI project. The AI tooling landscape is already quite saturated with tools, and the abundance of options makes tool selection difficult. Some of these tools even provide similar solutions. You may be tempted to lean toward adopting tools just because of the enticing features they present. However, there are other crucial factors that you should consider before selecting a tool, which include:

Popularity
Impact
Innovation
Community engagement
Relevance to emerging AI trends.

Popularity

Widely adopted tools often indicate active development, regular updates, and strong community support, ensuring reliability and longevity.

Impact

A tool with a track record of addressing pain points, delivering measurable improvements, providing long-term project sustainability, and adapting to evolving needs of the problems of an AI project is a good measure of an impactful tool that stakeholders are interested in leveraging.

Innovation

Tools that embrace more modern technologies and offer unique features demonstrate a commitment to continuous improvement and have the potential to drive advancements and unlock new possibilities.

Community engagement

Active community engagement fosters collaboration, provides support, and ensures a tool's continued relevance and improvement.

Relevance to emerging AI trends

Tools aligned with emerging trends like LLMs enable organizations to leverage the latest capabilities, ensuring their projects remain at the forefront of innovation.

25 open source tools for your AI project

Based on these factors, here are 25 tools that you and the different stakeholders on your team can use for various stages in your AI project.

1. KitOps

Multiple stakeholders are involved in the machine learning development lifecycle which requires different MLOps tools and environments at various stages of the AI project., which makes it hard to guarantee an organized, portable, transparent, and secure model development pipeline.

This introduces opportunities for model lineage breaks and accidental or malicious model tampering or modifications during model development. Since the contents of a model are a "black box”—without efficient storage and lineage—it is impossible to know if a model's or model artifact's content has been tampered with between model development, staging, deployment, and retirement pipelines.

KitOps provides AI project stakeholders with a secure package called ModelKit that they can use to share and manage models, code, metadata, and artifacts throughout the ML development lifecycle.

The ModelKit is an immutable OCI-standard artifact that leverages normal container-native technologies (similar to Docker and Kubernetes), making them seamlessly interoperable and portable across various stakeholders using common software tools and environments. As an immutable package, ModelKit is tamper-proof. This tamper-proof property provides stakeholders with a versioning system that tracks every single update to any of its content (i.e., models, code, metadata, and artifacts) throughout the ML development and deployment pipelines.

2. LangChain

LangChain is a machine learning framework that enables ML engineers and software developers to build end-to-end LLM applications quickly. Its modular architecture allows them to easily mix and match its extensive suite of components to create custom LLM applications.

LangChain simplifies the LLM application's development and deployment stages with its ecosystem of interconnected parts, consisting of LangSmith, LangServe, and LangGraph. Together, they enable ML engineers and software developers to build robust, diverse, and scaleable LLM applications efficiently.

LangChain enables professionals without a strong AI background to easily build an application with large language models (LLMs).

3. Pachyderm

Pachyderm is a data versioning and management platform that enables engineers to automate complex data transformations. It uses a data infrastructure that provides data lineage via a data-driven versioning pipeline. The version-controlled pipelines are automatically triggered based on changes in the data. It tracks every modification to the data, making it simple to duplicate previous results and test with various pipeline versions.

Pachyderm's data infrastructure provides "data-aware" pipelines with versioning and lineage.

4. ZenML

ZenML is a structured MLOps framework that abstracts the creation of MLOps pipelines, allowing data scientists and ML engineers to focus on the core steps of data preprocessing, model training, evaluation, and deployment without getting bogged down in infrastructure details.

ZenML framework abstracts MLOps infrastructure complexities and simplifies the adoption of MLOps, making the AI project components accessible, reusable, and reproducible.

5. Prefect

Prefect is an MLOps orchestration framework for machine learning pipelines. It uses the concepts of tasks (individual units of work) and flows (sequences of tasks) to construct an ML pipeline for running different steps of an ML code, such as feature engineering and training. This modular structure enables ML engineers to simplify creating and managing complex ML workflows.

Prefect simplifies data workflow management, robust error handling, state management, and extensive monitoring.

6. Ray

Ray is a distributed computing framework that makes it easy for data scientists and ML engineers to scale machine learning workloads during model development. It simplifies scaling computationally intensive workloads, like loading and processing extensive data or deep learning model training, from a single machine to large clusters.

Ray's core distributed runtime, making it easy to scale ML workloads.

7. Metaflow

Metaflow is an MLOps tool that enhances the productivity of data scientists and ML engineers with a unified API. The API offers a code-first approach to building data science workflows, and it contains the whole infrastructure stack that data scientists and ML engineers need to execute AI projects from prototype to production.

8. MLflow

MLflow allows data scientists and engineers to manage model development and experiments. It streamlines your entire model development lifecycle, from experimentation to deployment.

MLflow’s key features include:
MLflow tracking: It provides an API and UI to record and query your experiment, parameters, code versions, metrics, and output files when training your machine learning model. You can then compare several runs after logging the results.

MLflow projects: It provides a standard reusable format to package data science code and includes API and CLI to run projects to chain into workflows. Any Git repository / local directory can be treated as an MLflow project.

MLflow models: It offers a standard format to deploy ML models in diverse serving environments.

MLflow model registry: It provides you with a centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of a model. It also enables model lineage (from your model experiments and runs), model versioning, and development stage transitions (i.e., moving a model from staging to production).

9. Kubeflow

Kubeflow is an MLOps toolkit for Kubernetes. It is designed to simplify the orchestration and deployment of ML workflows on Kubernetes clusters. Its primary purpose is to make scaling and managing complex ML systems easier, portable, and scalable across different infrastructures.

Kubeflow is a key player in the MLOps landscape, and it introduced a robust and flexible platform for building, deploying, and managing machine learning systems on Kubernetes. This unified platform for developing, deploying, and managing ML models enables collaboration among data scientists, ML engineers, and DevOps teams.

10. Seldon core

Seldon core is an MLOps platform that simplifies the deployment, serving, and management of machine learning models by converting ML models (TensorFlow, PyTorch, H2o, etc.) or language wrappers (Python, Java, etc.) into production-ready REST/GRPC microservices. Think of them as pre-packaged inference servers or custom servers. Seldon core also enables the containerization of these servers and offers out-of-the-box features like advanced metrics, request logging, explainers, outlier detectors, A/B tests, and canaries.

Seldon Core's solution focuses on model management and governance. Its adoption is geared toward ML and DevOps engineers, specifically for model deployment and monitoring, instead of small data science teams.

11. DVC (Data Version Control)

Implementing version control for machine learning projects entails managing both code and the datasets, ML models, performance metrics, and other development-related artifacts. Its purpose is to bring the best practices from software engineering, like version control and reproducibility, to the world of data science and machine learning. DVC enables data scientists and ML engineers to track changes to data and models like Git does for code, making it able to run on top of any Git repository. It enables the management of model experiments.

DVC's integration with Git makes it easier to apply software engineering principles to data science workflows.

12. Evidently AI

EvidentlyAI is an observability platform designed to analyze and monitor production machine learning (ML) models. Its primary purpose is to help ML practitioners understand and maintain the performance of their deployed models over time. Evidently provides a comprehensive set of tools for tracking key model performance metrics, such as accuracy, precision, recall, and drift detection. It also enables stakeholders to generate interactive reports and visualizations that make it easy to identify issues and trends.

13. Mage AI

Mage AI is a data transforming and integrating framework that allows data scientists and ML engineers to build and automate data pipelines without extensive coding. Data scientists can easily connect to their data sources, ingest data, and build production-ready data pipelines within Mage notebooks.

14. ML Run

ML Run provides a serverless technology for orchestrating end-to-end MLOps systems. The serverless platform converts the ML code into scalable and managed microservices. This streamlines the development and management pipelines of the data scientists, ML, software, and DevOps/MLOps engineers throughout the entire machine learning (ML) lifecycle, across their various environments.

15. Kedro

Kedro is an ML development framework for creating reproducible, maintainable, modular data science code. Kedro improves AI project development experience via data abstraction and code organization. Using lightweight data connectors, it provides a centralized data catalog to manage and track datasets throughout a project. This enables data scientists to focus on building production level code through Kedro's data pipelines, enabling other stakeholders to use the same pipelines in different parts of the system.

Kedro focuses on data pipeline development by enforcing SWE best practices for data scientists.

16. WhyLogs

WhyLogs by WhyLabs is an open-source data logging library designed for machine learning (ML) models and data pipelines. Its primary purpose is to provide visibility into data quality and model performance over time.

With WhyLogs, MLOps engineers can efficiently generate compact summaries of datasets (called profiles) that capture essential statistical properties and characteristics. These profiles track changes in datasets over time, helping detect data drift – a common cause of model performance degradation. It also provides tools for visualizing key summary statistics from dataset profiles, making it easy to understand data distributions and identify anomalies.

17. Feast

Defining, storing, and accessing features for model training and online inference in silos (i.e., from different locations) can lead to inconsistent feature definitions, data duplication, complex data access and retrieval, etc. Feast solves the challenge of stakeholders managing and serving machine learning (ML) features in development and production environments.

Feast is a feature store that bridges the gap between data and machine learning models. It provides a centralized repository for defining feature schemas, ensuring consistency across different teams and projects. This can ensure that the feature values used for model inference are consistent with the state of the feature at the time of the request, even for historical data.

Feast is a centralized repository for managing, storing, and serving features, ensuring consistency and reliability across training and serving environments.

18. Flyte

Data scientists and data and analytics pipeline engineers typically rely on ML and platform engineers to transform models and training pipelines into production-ready systems.

Flyte empowers data scientists and data and analytics engineers with the autonomy to work independently. It provides them with a Python SDK for building workflows, which can then be effortlessly deployed to the Flyte backend. This simplifies the development, deployment, and management of complex ML and data workflows by building and executing reliable and reproducible pipelines at scale.

19. Featureform

The ad-hoc practice of data scientists developing features for model development in isolation makes it difficult for other AI project stakeholders to understand, reuse, or build upon existing work. This leads to duplicated effort, inconsistencies in feature definitions, and difficulties in reproducing results.

Featureform is a virtual feature store that streamlines data scientists' ability to manage and serve features for machine learning models. It acts as a "virtual" layer over existing data infrastructure like Databricks and Snowflake. This allows data scientists to engineer and deploy features directly to the data infrastructure for other stakeholders. Its structured, centralized feature repository and metadata management approach empower data scientists to seamlessly transition their work from experimentation to production, ensuring reproducibility, collaboration, and governance throughout the ML lifecycle.

20. Deepchecks

Deepchecks is an ML monitoring tool for continuously testing and validating machine learning models and data from an AI project's experimentation to the deployment stage. It provides a wide range of built-in checks to validate model performance, data integrity, and data distribution. These checks help identify issues like model bias, data drift, concept drift, and leakage.

21. Argo

Argo provides a Kubernetes-native workflow engine for orchestrating parallel jobs on Kubernetes. Its primary purpose is to streamline the execution of complex, multi-step workflows, making it particularly well-suited for machine learning (ML) and data processing tasks. It enables ML engineers to define each step of the ML workflow (data preprocessing, model training, evaluation, deployment) as individual containers, making it easier to manage dependencies and ensure reproducibility.

Argo workflows are defined using DAGs, where each node represents a step in the workflow (typically a containerized task), and edges represent dependencies between steps. Workflows can be defined as a sequence of tasks (steps) or as a Directed Acyclic Graph (DAG) to capture dependencies between tasks.

22. Deep Lake

Deep Lake (formerly Activeloop Hub) is an ML-specific database tool designed to act as a data lake for deep learning and a vector store for RAG applications. Its primary purpose is accelerating model training by providing fast and efficient access to large-scale datasets, regardless of format or location.

23. Hopsworks feature store

Advanced MLOps pipelines with at least an MLOps maturity level 1 architecture require a centralized feature store. Hopsworks is a perfect feature store for such architecture. It provides an end-to-end solution for managing ML feature lifecycle, from data ingestion and feature engineering to model training, deployment, and monitoring. This facilitates feature reuse, consistency, and faster model development.

24. NannyML

NannyML is a Python library specialized in post-deployment monitoring and maintenance of machine learning (ML) models. It enables data scientists to detect and address silent model failure, estimate model performance without immediate ground truth data, and identify data drift that might be responsible for performance degradation.

25. Delta Lake

Delta Lake is a storage layer framework that provides reliability to data lakes. It addresses the challenges of managing large-scale data in lakehouse architectures, where data is stored in an open format and used for various purposes, like machine learning (ML). Data engineers can build real-time pipelines or ML applications using Delta Lake because it supports both batch and streaming data processing. It also brings ACID (atomicity, consistency, isolation, durability) transactions to data lakes, ensuring data integrity even with concurrent reads and writes from multiple pipelines.

Considering factors like popularity, impact, innovation, community engagement, and relevance to emerging AI trends can help guide your decision when picking open source AI/ML tools, especially for those offering the same value proposition. In some cases, such tools may have different ways of providing solutions for the same use case or possess unique features that make them perfect for a specific project use case.

저작자표시 비영리 (새창열림)

'프로그래밍 > AI_DeepLearning' 카테고리의 다른 글

오픈AI “챗GPT 주간활성사용자 2억명”...1년새 100% 성장 (1)	2024.09.02
이전학습을 기억 못하는 AI (0)	2024.08.30
국방부, 민간 클라우드로 네이버 선택…메가존이 시스템 구축 (0)	2024.08.26
HyperCLOVA X Vision: 눈을 뜨다 (2)	2024.08.26
생성형 AI의 개발 프로세스 이해하기 (1)	2024.06.03

PREV 이전 1 2 3 4 5 6 ···24 NEXT 다음

프로그래밍/AI_DeepLearning

'프로그래밍 > AI_DeepLearning' 카테고리의 다른 글

'프로그래밍 > AI_DeepLearning' 카테고리의 다른 글

AI는 미친 외계인 프로그래머

'프로그래밍 > AI_DeepLearning' 카테고리의 다른 글

'프로그래밍 > AI_DeepLearning' 카테고리의 다른 글

'프로그래밍 > AI_DeepLearning' 카테고리의 다른 글

25 Open Source AI Tools to Cut Your Development Time in Half

Picking open source tools for AI project

Popularity

Impact

Innovation

Community engagement

Relevance to emerging AI trends

25 open source tools for your AI project

1. KitOps

2. LangChain

3. Pachyderm

4. ZenML

5. Prefect

6. Ray

7. Metaflow

8. MLflow

9. Kubeflow

10. Seldon core

11. DVC (Data Version Control)

12. Evidently AI

13. Mage AI

14. ML Run

15. Kedro

16. WhyLogs

17. Feast

18. Flyte

19. Featureform

20. Deepchecks

21. Argo

22. Deep Lake

23. Hopsworks feature store

24. NannyML

25. Delta Lake

'프로그래밍 > AI_DeepLearning' 카테고리의 다른 글

티스토리툴바