当前位置：网站首页>Pytorch target detection data processing (II) extracting difficult samples, low AP samples

Pytorch target detection data processing (II) extracting difficult samples, low AP samples

2022-07-20 08:24:00 【Visual feast】

Abstract

There are many kinds of data processing in the competition , Analysis of image data , And how to strengthen the relatively low after analysis ap Category , Today, I will explain several difficult samples I have used recently, learning and low concentration ap Data processing after enhancement .
The difficult sample is loss The bigger one , It accounts for a large part of each batch of training loss, Lead to loss It is difficult to continue to reduce .

Difficult sample extraction

I'm using pytorch edition efficientdet, The overall process is also relatively simple , Is in the dataloader Last change getitem This function , Add the image when returning name That's all right. . Because most of the training is batch training collect This function returns with an image name Just go , And then to train In the! loss Greater than 0.5 The left and right image names are recorded , Write to a txt In the text . The specific operation is as follows , Look at me first train.py Modify it
Insert picture description here
When loading the image name Loading in ,

Insert picture description here
I focus on classification here loss, So when processing, load the name into a txt Just in the document , Come and look at me carefully dataloader To deal with

Insert picture description here

Take a closer look here , stay sample I set the load name, There is another point that needs to be modified , Relatively simple , It is the processing of data loading , You can see the blog I wrote before. Data loading is easy to master .
Finally, put this txt The photos in the file are extracted separately from the training set , The code is as follows

import numpy as np
# a = 'hahawqeq'
file=open('./haha.txt','r')
aa = []
for i in file.readlines():
    aa.append(i.split('\n')[0])
print(len(aa))
dd = list(np.unique(aa))
print(len(dd))
import os
import shutil
xml_train = './coco/train2017/'
i = 0
while(i<len(dd)):
    random_file = dd[i]
    source_file = "%s/%s" % (xml_train, random_file)
    xml_val = './coco/kunnan/'
    if random_file not in os.listdir(xml_val):
        shutil.move(source_file, xml_val)
        i=i+1

The code is simple , Just a few details source_file Generate relative paths , And then move on to xml_val Under the path , It's handled successfully ,xml The same goes for documents , Then generate json Just file it .

low ap Data category enhancement

When we train, we can't be high in every category , So we need to deal with json File extraction ap Lower category ,

import os
import torch
import numpy as np
import matplotlib.pyplot as plt
from torch.utils.data import Dataset, DataLoader
from pycocotools.coco import COCO
import cv2
# 1,19,36
coco = COCO('./test/all.json')
ids1 = coco.getAnnIds()
# print(ids1)
ids2 = coco.getImgIds()
# print(ids2)
items = []
for i in range(len(ids1)):
    data = coco.loadAnns(ids1[i])
   # In this step, I deal with the required categories corresponding to my own categories 
    if data[0]['category_id']==1:
        items.append(data[0]['image_id'])
    elif data[0]['category_id']==5:
        items.append(data[0]['image_id'])
    elif data[0]['category_id']==18:
        items.append(data[0]['image_id'])
    elif data[0]['category_id']==0:
        items.append(data[0]['image_id'])
    elif data[0]['category_id']==19:
        items.append(data[0]['image_id'])
    elif data[0]['category_id']==36:
        items.append(data[0]['image_id'])
    elif data[0]['category_id']==22:
        items.append(data[0]['image_id'])
    elif data[0]['category_id']==27:
        items.append(data[0]['image_id'])
    elif data[0]['category_id']==35:
        items.append(data[0]['image_id'])
    elif data[0]['category_id']==42:
        items.append(data[0]['image_id'])
    else:
        continue
# print(items)
item =np.unique(items)
# print(item)
name =[]
for j in range(len(item)):
    data=coco.loadImgs(ids2[j])
    name.append(data[0]['file_name'])
import os
import shutil
xml_train = './coco/train2017/'
i = 0

while(i<len(name)):
    random_file = name[i].split('.')[0]+'.jpg'
    source_file = "%s/%s" % (xml_train, random_file)
    xml_val = './coco/lowap/'
    print(i)
    if random_file not in os.listdir(xml_val):
        shutil.move(source_file, xml_val)
        i=i+1

The whole is to see a process , Yes json Full use of documents , Then we will enhance the extracted image data

import albumentations
import cv2
from PIL import Image, ImageDraw
import numpy as np 
from albumentations import (GridDropout,GridDistortion)
import matplotlib.pyplot as plt
import glob
import numpy as np
import matplotlib.pyplot as plt
import cv2
def imread(image):
    image=cv2.imread(image)
    image=cv2.cvtColor(image,cv2.COLOR_BGR2RGB)
    image=image.astype(np.uint8)
    return np.array(image)

def show(image):
    plt.imshow(image)
    plt.axis('off')
    plt.show()
# Path to data
data_folder = f"./lowap/"

# Read filenames in the data folder
filenames = glob.glob(f"{data_folder}*.jpg")
for i in range(len(filenames)):
    b = filenames[i]
    print(b)
    a =imread(b)
    image2 =GridDropout(0.2,10,p=1)(image=a)['image']
    dd='./haha/'+filenames[i].split('/')[2]
    cv2.imwrite(dd,image2)

I'm using albu Packet enhancement is more convenient , This is offline enhancement .

Insert picture description here

summary

There are many data enhancement methods , I simply used one of them , The more advanced one is cutmix Let's do it , You can also test more enhancements

原网站

版权声明
本文为[Visual feast]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/201/202207190501377989.html